Geometric Alignment via Teacher-Free Self-Distillation
The "Infinite Gap" and Why Softmax Keeps Me Up at Night
To understand any solution, we first have to really understand the problem. I’ve spent the better part of my research career staring at loss curves, watching them dip, plateau, and occasionally spike catastrophically. We often treat the loss function as a black box, a simple signal telling the network "good dog" or "bad dog." But if you look closer, specifically at the geometry of the final layer, you realize that our standard tools are fundamentally broken.









