Apple Silicon costs more than OpenRouter

May 17, 2026 · ai · Source ↗

TLDR

Running local inference on an M5 Max MacBook Pro costs roughly 3x more per million tokens than OpenRouter, with hardware depreciation as the dominant factor.

At 10-40 tokens/sec on Gemma4 31b, amortized cost on a $4,299 M5 Max MBP ranges from $0.40 to $4.79 per million tokens depending on lifespan and speed assumptions.
OpenRouter serves Gemma4 31b at $0.38-0.50 per million tokens and hits 60-70 tokens/sec, 3-7x faster than local M5 Max throughput.
Electricity is nearly negligible at ~$0.02/hr; hardware cost at 5-year depreciation runs $0.10-0.16/hr, making device cost the key variable.
The crossover point only exists in the most optimistic scenario: 50W, 40 tokens/sec, 10-year device life.
For salaried developers, token costs (local or cloud) are roughly 1000x cheaper than their own labor, making model quality the deciding factor.

Commenters challenged the cost framing: the laptop purchase also delivers a usable workstation, so allocating full hardware cost to inference overstates local token price for anyone who would buy the machine anyway.
A key omission in the analysis: for agentic workloads, input tokens dominate total token count and are essentially free locally, which could significantly shift the break-even math.
Privacy, data control, and offline capability were cited as non-negotiable reasons to run local regardless of per-token cost comparisons.

@maho: Points out that input tokens dominate agentic workloads and are near-zero cost locally, a factor the article’s output-only analysis ignores.
@andai: Suggests the real metric is the cost delta between a model-capable laptop and whatever you would have bought anyway.