LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap

May 19, 2026 · ai web · Source ↗

TLDR

LLMCap is a paid proxy service that enforces hard dollar caps on LLM API calls, rejecting requests with HTTP 429 once a budget is hit.

Supports Anthropic, OpenAI, Google Gemini, Mistral, and Cohere via a single base URL swap; adds under 35ms latency.
Hard enforcement means the next request after cap is hit is blocked before reaching the provider, so no token is consumed or billed.
Streaming is supported; if cap is hit mid-stream, the connection closes and a 429 SSE event is sent, and the triggering token is not charged.
Provider API keys are passed through per-request and immediately discarded; only a bcrypt-hashed LLMCap proxy key is stored.
Tooling includes a VS Code extension, PyPI CLI, and a Windows tray app for live spend monitoring. Priced at $19/mo after a 3-day trial.

Commenters question whether the core proxy logic justifies a recurring subscription, given it could be replicated quickly with minimal code.
Privacy concerns were raised around routing all LLM traffic through a third-party proxy, even with the no-key-storage claim.