Apple Silicon costs more than OpenRouter

· ai · Source ↗

TLDR

  • Running local inference on an M5 Max MacBook Pro costs roughly 3x more per million tokens than OpenRouter, with hardware depreciation as the dominant factor.

Key Takeaways

  • At 10-40 tokens/sec on Gemma4 31b, amortized cost on a $4,299 M5 Max MBP ranges from $0.40 to $4.79 per million tokens depending on lifespan and speed assumptions.
  • OpenRouter serves Gemma4 31b at $0.38-0.50 per million tokens and hits 60-70 tokens/sec, 3-7x faster than local M5 Max throughput.
  • Electricity is nearly negligible at ~$0.02/hr; hardware cost at 5-year depreciation runs $0.10-0.16/hr, making device cost the key variable.
  • The crossover point only exists in the most optimistic scenario: 50W, 40 tokens/sec, 10-year device life.
  • For salaried developers, token costs (local or cloud) are roughly 1000x cheaper than their own labor, making model quality the deciding factor.

Hacker News Comment Review

  • Commenters challenged the cost framing: the laptop purchase also delivers a usable workstation, so allocating full hardware cost to inference overstates local token price for anyone who would buy the machine anyway.
  • A key omission in the analysis: for agentic workloads, input tokens dominate total token count and are essentially free locally, which could significantly shift the break-even math.
  • Privacy, data control, and offline capability were cited as non-negotiable reasons to run local regardless of per-token cost comparisons.

Notable Comments

  • @maho: Points out that input tokens dominate agentic workloads and are near-zero cost locally, a factor the article’s output-only analysis ignores.
  • @andai: Suggests the real metric is the cost delta between a model-capable laptop and whatever you would have bought anyway.

Original | Discuss on HN