Neighborhood Growth Determines Geometric Priors for Relational Representation Learning

Abstract

The problem of identifying geometric structure in heterogeneous, high-dimensional data is a cornerstone of representation learning. While there exists a large body of literature on the embeddability of canonical graphs, such as lattices or trees, the heterogeneity of the relational data typically encountered in practice limits the applicability of these classical methods. In this paper, we propose a combinatorial approach to evaluating embeddability, i.e., to decide whether a data set is best represented in Euclidean, Hyperbolic or Spherical space. Our method analyzes nearest-neighbor structures and local neighborhood growth rates to identify the geometric priors of suitable embedding spaces. For canonical graphs, the algorithm’s prediction provably matches classical results. As for large, heterogeneous graphs, we introduce an efficiently computable statistic that approximates the algorithm’s decision rule. We validate our method over a range of benchmark data sets and compare with recently published optimization-based embeddability methods.

Publication
International Conference on Artificial Intelligence and Statistics