Computer Use is 45x more expensive than structured APIs

· ai-agents web ai · Source ↗

TLDR

  • Benchmark pits browser-use vision agent against auto-generated HTTP endpoints on the same admin panel: 551k tokens and 17 min vs 12k tokens and 20 seconds.

Key Takeaways

  • Vision path (Claude Sonnet + browser-use 0.12) required a 14-step UI walkthrough just to complete the task; without it, the agent missed 3 of 4 pending reviews by never paginating.
  • Step count is set by the interface, not the model. Better vision models cut error rate per screenshot but not screenshot count.
  • Token variance on the vision path was extreme: 407k to 751k inputs across 3 runs, making cost estimation unreliable from a single trial.
  • API path (same Sonnet, same app handlers) ran identically each trial: 8 tool calls, ~12k tokens, ±27 token variance across 5 runs.
  • Claude Haiku finished the API path in under 8 seconds for under 10k tokens; it could not complete the vision path due to browser-use 0.12 structured-output schema failures.

Hacker News Comment Review

  • Commenters broadly agreed the comparison is valid but argued vision agents should be a last resort for apps you do not control, not a default for internal tooling where an API or CLI is buildable.
  • Several builders noted the prompt-engineering cost of a 14-step walkthrough is real engineering work that never appears in token counts, making the stated 45x gap an undercount of true cost.
  • A recurring thread explored hybrid approaches: one agent maps the UI surface in a test environment and emits a structured workflow, which a second agent then executes via CLI or accessibility APIs rather than repeated screenshots.

Notable Comments

  • @theptip: “If anything I am impressed that it’s only 50x worse” – argues computer use on owned DB state is categorically wrong tool choice.
  • @angry_octet: Notes the benchmark implicitly documents an agent-hardening checklist: randomized labels, dynamic element positions, scroll traps – already standard in corporate SaaS.

Original | Discuss on HN