LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap

· ai web · Source ↗

TLDR

  • LLMCap is a paid proxy service that enforces hard dollar caps on LLM API calls, rejecting requests with HTTP 429 once a budget is hit.

Key Takeaways

  • Supports Anthropic, OpenAI, Google Gemini, Mistral, and Cohere via a single base URL swap; adds under 35ms latency.
  • Hard enforcement means the next request after cap is hit is blocked before reaching the provider, so no token is consumed or billed.
  • Streaming is supported; if cap is hit mid-stream, the connection closes and a 429 SSE event is sent, and the triggering token is not charged.
  • Provider API keys are passed through per-request and immediately discarded; only a bcrypt-hashed LLMCap proxy key is stored.
  • Tooling includes a VS Code extension, PyPI CLI, and a Windows tray app for live spend monitoring. Priced at $19/mo after a 3-day trial.

Hacker News Comment Review

  • Commenters question whether the core proxy logic justifies a recurring subscription, given it could be replicated quickly with minimal code.
  • Privacy concerns were raised around routing all LLM traffic through a third-party proxy, even with the no-key-storage claim.

Original | Discuss on HN