GPT-5.5 launches as OpenAI’s most capable agentic model, matching GPT-5.4 latency while outperforming it on coding, computer use, and knowledge work benchmarks.
Key Takeaways
State-of-the-art across Terminal-Bench 2.0 (82.7%), SWE-Bench Pro (58.6%), OSWorld-Verified (78.7%), and GDPval (84.9%) for agentic coding and knowledge work.
Matches GPT-5.4 per-token latency in production serving while using fewer tokens on Codex tasks, improving both capability and cost efficiency.
Available today to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex; API access requires additional safeguards and is arriving soon.
Scientific research gains: leads on GeneBench (multi-stage genetics data analysis) and BixBench (bioinformatics); contributed to a new proof on Ramsey numbers.
Scores 98.0% on Tau2-bench Telecom without prompt tuning and 88.5% on internal investment-banking modeling, underscoring breadth across professional knowledge work.
Hacker News Comment Review
GPT-5.5’s 82.7% CyberGym score, open to all users, directly rivals Anthropic Mythos’s gated 83%, making it the accessible choice for security practitioners.
Full benchmark comparisons against Mythos show 5.5 trailing on SWE-bench Pro (58.6% vs 77.8%) and GPQA Diamond, but competitive on Terminal-Bench 2.0 and OSWorld-Verified.
One commenter flags an 86% hallucination rate for GPT-5.5 on Artificial Analysis Omniscience versus Opus 4.7’s 36%, a substantial reliability gap for production deployments.
Notable Comments
@minimaxir: GPT-5.5 via Codex analyzed weeks of production traffic and wrote custom GPU scheduling heuristics that boosted token generation speeds 20%, a self-optimization loop not measured by standard benchmarks.
@Someone1234: Codex pricing page reveals tighter message limits for 5.5 vs 5.4 and 5.3, potentially offsetting claimed per-task token efficiency gains for high-volume users.
@simonw: GPT-5.5 already accessible via a Codex API backdoor used by OpenClaw, with apparent OpenAI approval, giving builders early programmatic access ahead of the official API launch.