GPT-5.5

· ai coding · Source ↗

TLDR

  • GPT-5.5 launches as OpenAI’s most capable agentic model, matching GPT-5.4 latency while outperforming it on coding, computer use, and knowledge work benchmarks.

Key Takeaways

  • State-of-the-art across Terminal-Bench 2.0 (82.7%), SWE-Bench Pro (58.6%), OSWorld-Verified (78.7%), and GDPval (84.9%) for agentic coding and knowledge work.
  • Matches GPT-5.4 per-token latency in production serving while using fewer tokens on Codex tasks, improving both capability and cost efficiency.
  • Available today to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex; API access requires additional safeguards and is arriving soon.
  • Scientific research gains: leads on GeneBench (multi-stage genetics data analysis) and BixBench (bioinformatics); contributed to a new proof on Ramsey numbers.
  • Scores 98.0% on Tau2-bench Telecom without prompt tuning and 88.5% on internal investment-banking modeling, underscoring breadth across professional knowledge work.

Hacker News Comment Review

  • GPT-5.5’s 82.7% CyberGym score, open to all users, directly rivals Anthropic Mythos’s gated 83%, making it the accessible choice for security practitioners.
  • Full benchmark comparisons against Mythos show 5.5 trailing on SWE-bench Pro (58.6% vs 77.8%) and GPQA Diamond, but competitive on Terminal-Bench 2.0 and OSWorld-Verified.
  • One commenter flags an 86% hallucination rate for GPT-5.5 on Artificial Analysis Omniscience versus Opus 4.7’s 36%, a substantial reliability gap for production deployments.

Notable Comments

  • @minimaxir: GPT-5.5 via Codex analyzed weeks of production traffic and wrote custom GPU scheduling heuristics that boosted token generation speeds 20%, a self-optimization loop not measured by standard benchmarks.
  • @Someone1234: Codex pricing page reveals tighter message limits for 5.5 vs 5.4 and 5.3, potentially offsetting claimed per-task token efficiency gains for high-volume users.
  • @simonw: GPT-5.5 already accessible via a Codex API backdoor used by OpenClaw, with apparent OpenAI approval, giving builders early programmatic access ahead of the official API launch.

Original | Discuss on HN