I am an Assistant Professor of Applied Mathematics and of Computer Science at Harvard. My research focuses on utilizing geometric structure in data for the design of efficient Machine Learning and Optimization methods.
In 2021-2022, I was a Hooke Research Fellow at the Mathematical Institute in Oxford and a Nicolas Kurti Junior Research Fellow at Brasenose College. In Fall 2021, I was a Research Fellow at the Simons Institute in Berkeley, where I participated in the program Geometric Methods for Optimization and Sampling. Previously, I received my PhD from Princeton University (2021) under the supervision of Charles Fefferman, held visiting positions at MIT and the Max Planck Institute for Mathematics in the Sciences and interned in the research labs of Facebook, Google and Microsoft. I am also interested in applications of Artificial Intelligence in the legal space and am the Chief Scientist of the start up Claudius Legal Intelligence.
PhD in Applied Mathematics, 2021
BSc/MSc in Mathematics and Physics, 2016
University of Leipzig
MSc in Applied Mathematics (during year abroad), 2015
University of Washington
We study projection-free methods for constrained Riemannian optimization. In particular, we propose a Riemannian Frank-Wolfe (RFW) method that handles constraints directly, in contrast to prior methods that rely on (potentially costly) projections. We analyze non-asymptotic convergence rates of RFW to an optimum for geodesically convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete example, we specialize RFW to the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian “linear” oracle required by RFW admits a closed form solution; this result may be of independent interest. We complement our theoretical results with an empirical comparison of RFW against state-of-the-art Riemannian optimization methods, and observe that RFW performs competitively on the task of computing Riemannian centroids.
We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions. This structure arises in several optimization problems in statistics and machine learning, e.g., for matrix scaling, M-estimators for covariances, and Brascamp-Lieb inequalities. Our work offers efficient algorithms that on the one hand exploit g-convexity to ensure global optimality along with guarantees on iteration complexity. On the other hand, the split structure permits us to develop Euclidean Majorization-Minorization algorithms that help us bypass the need to compute expensive Riemannian operations such as exponential maps and parallel transport. We illustrate our results by specializing them to a few concrete optimization problems that have been previously studied in the machine learning literature. Ultimately, we hope our work helps motivate the broader search for mixed Euclidean-Riemannian optimization algorithms.
We exhibit optimal control strategies for a simple toy problem in which the underlying dynamics depend on a parameter that is initially unknown and must be learned. We consider a cost function posed over a finite time interval, in contrast to much previous work that considers asymptotics as the time horizon tends to infinity. We study several different versions of the problem, including Bayesian control, in which we assume a prior distribution on the unknown parameter; and “agnostic” control, in which we assume nothing about the unknown parameter. For the agnostic problems, we compare our performance with that of an opponent who knows the value of the parameter. This comparison gives rise to several notions of “regret,” and we obtain strategies that minimize the “worst-case regret” arising from the most unfavorable choice of the unknown parameter. In every case, the optimal strategy turns out to be a Bayesian strategy or a limit of Bayesian strategies.
We introduce Forman-Ricci curvature and its corresponding flow as characteristics for complex networks attempting to extend the common approach of node-based network analysis by edge-based characteristics. Following a theoretical introduction and mathematical motivation, we apply the proposed network-analytic methods to static and dynamic complex networks and compare the results with established node-based characteristics. Our work suggests a number of applications for data mining, including denoising and clustering of experimental data, as well as extrapolation of network evolution.