The context window has been shattered: Subquadratic debuts a 12M token window

· ai coding · Source ↗

TLDR

  • Miami startup Subquadratic claims its SSA architecture hits 12M tokens with linear compute scaling, beating GPT-5.5 on MRCR v2 and Opus 4.6 on SWE-bench.

Key Takeaways

  • Subquadratic Selective Attention (SSA) scales linearly in compute and memory vs. quadratic cost in standard transformers; reported 52x faster than dense attention at 1M tokens.
  • MRCR v2 score of 83 beats GPT-5.5 (74.0%); needle-in-a-haystack at 12M tokens hits 92.1%; SWE-bench Verified at 82.4% edges Opus 4.6 (81.4%) and Gemini 3.1 Pro (80.6%).
  • Key caveat: each benchmark run was single-pass due to inference cost; SWE-bench margin is partly attributed to harness configuration, not model alone.
  • Shipping now: API with full 12M-token window and SubQ Code CLI agent; 50M-token window targeted for Q4 2026; no open weights.
  • Prior category attempts (Magic.dev 100M-token LTM, Mamba, Longformer) all traded retrieval quality or retained quadratic steps; SSA claims to avoid the indexer trap that makes DeepSeek NSA selection quadratic.

Hacker News Comment Review

  • Commenters are broadly skeptical: no technical report published, no weights released, and the VC-backed structure makes independent verification unlikely near-term.
  • The thread points to an earlier HN discussion with primary source links, suggesting the article itself is a secondary write-up of a claim that has not yet been peer-reviewed or reproduced.

Notable Comments

  • @refibrillator: flags a prior HN thread with primary sources and notes no technical report or code release given VC funding structure.

Original | Discuss on HN