A Theory of Deep Learning

· science · Source ↗

TLDR

  • Preprint from Stanford’s Diffusion Group argues it has solved why deep learning generalizes, using output-space dynamical analysis and the empirical Neural Tangent Kernel.

Key Takeaways

  • Paper (Litman & Guo, arXiv:2605.01172) reframes neural networks as dynamical systems in output space, abandoning parameter-space complexity bounds entirely.
  • Core mechanism: training fills a “signal channel” (high-eigenvalue modes of the integrated eNTK) while noise gets trapped in a test-invisible “reservoir” (kernel null space).
  • Benign overfitting, double descent, implicit bias, and grokking are all unified as different behaviors of noise moving between the signal channel and reservoir.
  • A one-line Adam modification – update parameter k only if batch signal exceeds leave-one-out noise – claims 5x grokking acceleration and improved DPO fine-tuning with no validation set.
  • Authors claim the theory enables training directly on population risk, analytically jumping to final network state, and rethinking architecture around reservoir size vs. signal channel capacity.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN