Tourniquet is a free, local-first proxy that hard-caps your Anthropic API at a daily limit you set. When the cap hits, it kills the stream mid-response — your agent stops, your bill doesn't grow.
You set up an agent. It loops. You wake up to a $400 bill. It's not hypothetical — documented incidents include a single LangChain agent accumulating over $40,000 in a single run, and Cursor users hitting unexpected charges after leaving overnight tasks unattended. The common thread: no hard stop.
Token spend can compound faster than intuition suggests. An agentic loop with tool calls, retries, and long context windows can burn through a daily budget in under ten minutes if something goes sideways. The Anthropic Console offers monthly soft limits; they don't stop an in-flight request.
Existing options all have the same gap. Anthropic's own spend limits are monthly and advisory. LiteLLM is team-grade infrastructure with a meaningful ops surface. Helicone and similar SaaS proxies route your traffic through someone else's cloud — and none of them inject a clean stop mid-stream. They drop the TCP connection, which your agent sees as a generic network error, not a recoverable budget signal.
Tourniquet is a Python package with no system dependencies beyond Python 3.9+.
pip install tourniquet-dev
A single command starts the proxy and opens your browser at the local dashboard. No config files to edit, no environment to learn.
tourniquet
Enter your sk-ant- key in the dashboard, choose a daily dollar limit,
and point your agent at http://localhost:8989.
Done. Your key never leaves your machine.
Install on a desktop or laptop — tourniquet runs as a local proxy.
Coming soon: brew install tourniquet
git clone https://github.com/LowryDaniel/tourniquet cd tourniquet pip install -e ".[dev]" tourniquet
pip install --upgrade tourniquet
tq_ prefix and are bcrypt-hashed — the plaintext is shown once, then discarded.~/.tourniquet/tourniquet.db. You own it.
When your daily cap is reached mid-stream, Tourniquet doesn't drop the TCP connection.
It injects a synthetic SSE message_stop event at the end of the current
partial chunk, then cleanly closes the response:
This matters because the stop_reason field is surfaced to your agent code.
A well-written agent can catch tourniquet_cap_hit, log the event, and exit
gracefully — rather than crashing with an unhandled network exception.
Drop the TCP connection at the cap. The SDK raises a connection error. Your agent sees an unhandled exception — or retries and burns more tokens.
Inject a clean message_stop SSE event with a machine-readable stop_reason. The SDK calls your normal completion callback. You can handle it.