Qwen3.7-Max: The Agent Frontier

· ai-agents coding ai · Source ↗

TLDR

  • Alibaba’s Qwen3.7-Max is a proprietary agent-focused model claiming top-tier scores on SWE-Bench, MCP, and reasoning benchmarks, with a 35-hour autonomous kernel optimization demo.

Key Takeaways

  • Scores 80.4 on SWE-Verified, 60.6 on SWE-Pro, and 69.7 on Terminal Bench 2.0-Terminus, edging DS-V4-Pro Max on the latter.
  • Autonomous 35-hour run on a novel T-Head ZW-M890 PPU achieved 10x geometric mean speedup on SGLang’s Extend Attention kernel via 1,158 tool calls.
  • Cross-harness generalization trained by decoupling Task, Harness, and Verifier components; performs consistently across Claude Code, OpenClaw, Qwen Code, and custom scaffolds.
  • Environment scaling (more diverse agentic training environments) drives consistent benchmark gains; Qwen claims results generalize to unseen out-of-domain environments.
  • Available soon via Alibaba Cloud Model Studio API; no pricing or latency figures published yet.

Hacker News Comment Review

  • The main practical concern: Qwen3.7-Max is proprietary and API-only through Alibaba Cloud, with no US hyperscaler partnership, making production adoption harder for US-based teams.
  • Commenters flagged benchmark cherry-picking: comparisons use Opus-4.6 and other older versions rather than the latest competitor releases, a pattern across recent Qwen launches.
  • Community interest is high in open-weight releases at the 60-150B range (e.g., 122B, 397B), with local users already reporting solid results running Qwen 3.6 27B on consumer hardware like a Tesla P40.

Notable Comments

  • @tekacs: Wants a US-domiciled API option via a major hyperscaler for production use; notes the asymmetry with US model access policies.
  • @XCSme: Pricing and latency still unpublished at announcement time.

Original | Discuss on HN