Riemannian Frank-Wolfe Methods with Applications
Many applications involve non-Euclidean data, such as graphs, strings or matrices. In such cases, exploiting Riemannian geometry can deliver algorithms that are computationally superior to standard nonlinear programming approaches. This observation has resulted in an increasing interest in Riemannian methods in the optimization and machine learning community. Research on Riemannian optimization (especially constraint optimization) has largely focused on projected-gradient methods. In this talk we introduce Riemannian Frank-Wolfe (RFW) methods as a class of projection-free algorithms for geodesically convex and nonconvex problems. In contrast to projected-gradient methods, RFW is guaranteed to stay in the feasible region, circumventing the need to compute costly projections. At the heart of the algorithm lies a Riemannian ’linear’ oracle that determines the update conditioned on geodesically convex constraints. Developing means of solving this “linear” oracle efficiently is crucial to the competitiveness of the method. While in general a nonconvex semi-definite problem, we discuss matrix-valued tasks where the solution can be computed in closed form. RFW extends naturally to stochastic settings, where we discuss both purely stochastic and finite-sum problems, the latter including empirical risk minimization. Furthermore, we show that RFW converges at a nonasymptotic sublinear rate, recovering the best known guarantees for its Euclidean counterpart. Finally, we discuss two benchmark tasks, both of which are crucial subroutines in many machine learning methods: The computation of (i) Riemannian centroids and (ii) Wasserstein barycenters. For both problems, we discuss efficient implementations of the RFW oracle and compare against state-of-the-art methods, where we observe significant performance gains. Based on joint work with Suvrit Sra (MIT).