Alibaba’s Qwen3.7-Max is a proprietary agent-focused model claiming top-tier scores on SWE-Bench, MCP, and reasoning benchmarks, with a 35-hour autonomous kernel optimization demo.
Key Takeaways
Scores 80.4 on SWE-Verified, 60.6 on SWE-Pro, and 69.7 on Terminal Bench 2.0-Terminus, edging DS-V4-Pro Max on the latter.
Autonomous 35-hour run on a novel T-Head ZW-M890 PPU achieved 10x geometric mean speedup on SGLang’s Extend Attention kernel via 1,158 tool calls.
Cross-harness generalization trained by decoupling Task, Harness, and Verifier components; performs consistently across Claude Code, OpenClaw, Qwen Code, and custom scaffolds.
Environment scaling (more diverse agentic training environments) drives consistent benchmark gains; Qwen claims results generalize to unseen out-of-domain environments.
Available soon via Alibaba Cloud Model Studio API; no pricing or latency figures published yet.
Hacker News Comment Review
The main practical concern: Qwen3.7-Max is proprietary and API-only through Alibaba Cloud, with no US hyperscaler partnership, making production adoption harder for US-based teams.
Commenters flagged benchmark cherry-picking: comparisons use Opus-4.6 and other older versions rather than the latest competitor releases, a pattern across recent Qwen launches.
Community interest is high in open-weight releases at the 60-150B range (e.g., 122B, 397B), with local users already reporting solid results running Qwen 3.6 27B on consumer hardware like a Tesla P40.
Notable Comments
@tekacs: Wants a US-domiciled API option via a major hyperscaler for production use; notes the asymmetry with US model access policies.
@XCSme: Pricing and latency still unpublished at announcement time.