Preprint from Stanford’s Diffusion Group argues it has solved why deep learning generalizes, using output-space dynamical analysis and the empirical Neural Tangent Kernel.
Key Takeaways
Paper (Litman & Guo, arXiv:2605.01172) reframes neural networks as dynamical systems in output space, abandoning parameter-space complexity bounds entirely.
Core mechanism: training fills a “signal channel” (high-eigenvalue modes of the integrated eNTK) while noise gets trapped in a test-invisible “reservoir” (kernel null space).
Benign overfitting, double descent, implicit bias, and grokking are all unified as different behaviors of noise moving between the signal channel and reservoir.
A one-line Adam modification – update parameter k only if batch signal exceeds leave-one-out noise – claims 5x grokking acceleration and improved DPO fine-tuning with no validation set.
Authors claim the theory enables training directly on population risk, analytically jumping to final network state, and rethinking architecture around reservoir size vs. signal channel capacity.