Loss Landscape Degeneracy and Stagewise Development in Transformers

Publication
Transactions on Machine Learning Research (TMLR)