Apple Silicon costs LESS than OpenRouter

· ai ai-agents coding · Source ↗

TLDR

  • Reanalysis of M4 Max 128GB benchmarks finds local Apple Silicon inference is 14% cheaper than OpenRouter for dense models and 3x cheaper with MoE models like Gemma 4 26B.

Key Takeaways

  • Original cost comparison used output-only tokens; a realistic 4:1 or 5:1 input-to-output ratio significantly lowers local cost per million tokens.
  • With concurrency 4 via vllm bench on Gemma 4 31B, blended local cost is ~$0.14/M tokens vs OpenRouter’s ~$0.16/M tokens.
  • Switching to Gemma 4 26B (MoE) drops local cost to ~$0.038/M tokens vs OpenRouter’s ~$0.10/M tokens, a ~3x gap.
  • MacBook Pro residual value ($1.5k-$2.5k after 3-5 years) is factored into the asset cost model, improving local economics further.
  • Privacy benefits and worsening GPU supply/access conditions strengthen the case for local LLMs independent of price.

Hacker News Comment Review

  • The sole comment flags a title reversal error in the original HN submission, suggesting the linked post may have the headline inverted from what the analysis actually concludes.

Notable Comments


Original | Discuss on HN