TLDR
-
LLMCap is a paid proxy service that enforces hard dollar caps on LLM API calls, rejecting requests with HTTP 429 once a budget is hit.
Key Takeaways
-
Supports Anthropic, OpenAI, Google Gemini, Mistral, and Cohere via a single base URL swap; adds under 35ms latency.
-
Hard enforcement means the next request after cap is hit is blocked before reaching the provider, so no token is consumed or billed.
-
Streaming is supported; if cap is hit mid-stream, the connection closes and a 429 SSE event is sent, and the triggering token is not charged.
-
Provider API keys are passed through per-request and immediately discarded; only a bcrypt-hashed LLMCap proxy key is stored.
-
Tooling includes a VS Code extension, PyPI CLI, and a Windows tray app for live spend monitoring. Priced at $19/mo after a 3-day trial.
Hacker News Comment Review
-
Commenters question whether the core proxy logic justifies a recurring subscription, given it could be replicated quickly with minimal code.
-
Privacy concerns were raised around routing all LLM traffic through a third-party proxy, even with the no-key-storage claim.
Original | Discuss on HN