Apple Silicon costs LESS than OpenRouter

May 19, 2026 · ai ai-agents coding · Source ↗

TLDR

Reanalysis of M4 Max 128GB benchmarks finds local Apple Silicon inference is 14% cheaper than OpenRouter for dense models and 3x cheaper with MoE models like Gemma 4 26B.

Original cost comparison used output-only tokens; a realistic 4:1 or 5:1 input-to-output ratio significantly lowers local cost per million tokens.
With concurrency 4 via vllm bench on Gemma 4 31B, blended local cost is ~$0.14/M tokens vs OpenRouter’s ~$0.16/M tokens.
Switching to Gemma 4 26B (MoE) drops local cost to ~$0.038/M tokens vs OpenRouter’s ~$0.10/M tokens, a ~3x gap.
MacBook Pro residual value ($1.5k-$2.5k after 3-5 years) is factored into the asset cost model, improving local economics further.
Privacy benefits and worsening GPU supply/access conditions strengthen the case for local LLMs independent of price.

The sole comment flags a title reversal error in the original HN submission, suggesting the linked post may have the headline inverted from what the analysis actually concludes.