Discrete Curvature and Applications in Representation Learning @ Oxford Networks Seminar
The problem of identifying geometric structure in heterogeneous, high-dimensional data is a cornerstone of representation learning. In this talk, we study the problem of data geometry from the perspective of Discrete Geometry. We focus specifically on the analysis of relational data, i.e., data that is given as a graph or can be represented as such. We start by reviewing discrete notions of curvature, where we focus especially on discrete Ricci curvature. Then we discuss the problem of embeddability: For downstream machine learning and data science applications, it is often beneficial to represent data in a continuous space, i.e., Euclidean, Hyperbolic or Spherical space. How can we decide on a suitable representation space? While there exists a large body of literature on the embeddability of canonical graphs, such as lattices or trees, the heterogeneity of real-world data limits the applicability of these classical methods. We discuss a combinatorial approach for evaluating embeddability, where we analyze nearest-neighbor structures and local neighborhood growth rates to identify the geometric priors of suitable embedding spaces. For canonical graphs, the algorithm’s prediction provably matches classical results. As for large, heterogeneous graphs, we introduce an efficiently computable statistic that approximates the algorithm’s decision rule. We validate our method over a range of benchmark data sets and compare with recently published optimization-based embeddability methods.