1

Priors in Time: Missing Inductive Biases for Language Model Interpretability
Interpretability methods for language models often seek meaningful concepts in activations while treating features as independent …