Exploiting geometric structure in matrix-valued optimization
Matrix-valued optimization tasks arise in many machine learning applications. Often, exploiting non-Euclidean structure in such problems can give rise to algorithms that are computationally superior to standard nonlinear programming approaches. In this talk, we consider the problem of optimizing Riemannian “difference of convex” functions. Several classical optimization problems involve objective functions of this form, including barycenter problems, the computation of HPD matrix square roots, as well as the computation of Brascamp-Lieb constants. The latter is of central importance in many areas of Mathematics and Computer Science through connections to maximum likelihood estimators in Gaussian models, Tyler’s M-estimator of scatter matrices and Operator Scaling. We discuss a class of CCCP-style algorithms for solving Riemannian “difference of convex” functions, where the geometric structure of the problem gives rise to an efficiently solvable fixed-point iteration. We present a detailed convergence analysis for the proposed approach and illustrate its advantages on the problem of computing Brascamp-Lieb constants. This talk is based on joint work with Suvrit Sra.