Running local inference on an M5 Max MacBook Pro costs roughly 3x more per million tokens than OpenRouter, with hardware depreciation as the dominant factor.
Key Takeaways
At 10-40 tokens/sec on Gemma4 31b, amortized cost on a $4,299 M5 Max MBP ranges from $0.40 to $4.79 per million tokens depending on lifespan and speed assumptions.
OpenRouter serves Gemma4 31b at $0.38-0.50 per million tokens and hits 60-70 tokens/sec, 3-7x faster than local M5 Max throughput.
Electricity is nearly negligible at ~$0.02/hr; hardware cost at 5-year depreciation runs $0.10-0.16/hr, making device cost the key variable.
The crossover point only exists in the most optimistic scenario: 50W, 40 tokens/sec, 10-year device life.
For salaried developers, token costs (local or cloud) are roughly 1000x cheaper than their own labor, making model quality the deciding factor.
Hacker News Comment Review
Commenters challenged the cost framing: the laptop purchase also delivers a usable workstation, so allocating full hardware cost to inference overstates local token price for anyone who would buy the machine anyway.
A key omission in the analysis: for agentic workloads, input tokens dominate total token count and are essentially free locally, which could significantly shift the break-even math.
Privacy, data control, and offline capability were cited as non-negotiable reasons to run local regardless of per-token cost comparisons.
Notable Comments
@maho: Points out that input tokens dominate agentic workloads and are near-zero cost locally, a factor the article’s output-only analysis ignores.
@andai: Suggests the real metric is the cost delta between a model-capable laptop and whatever you would have bought anyway.