Towards distillation guarantees under algorithmic alignment for combinatorial optimization

Abstract

Distillation can transfer knowledge from a large source model into a smaller model better suited for deployment, but theoretical guarantees are limited for structured prediction and combinatorial optimization. This work studies distillation when the target model is a graph neural network aligned with a dynamic programming algorithm for the task. Under a linear representation hypothesis for the source model, it shows that distillation can be solved efficiently in the complexity parameters of the dynamic-programming transition function. The result gives a sufficient condition for successful distillation through algorithmic alignment.

Publication
Preliminary version in NeurIPS Workshop on Differentiable Learning of Combinatorial Algorithms