Demo 13 — `token-budget` plugin
Demo 13 — token-budget plugin
Module brief: §Module 13
UVP
Per-
(agent, model)cost ceiling for AI traffic. Estimated from response bytes (≈ 4 bytes per token). Stops a runaway agent before the AWS bill hits.
Use cases
- LLM agent cost control. Each agent ID gets a daily token budget; runaway loops stop at the budget instead of running unbounded.
- Per-model billing throttle. Different cost ceilings per
model —
claude-opushas a higher budget thangpt-3.5because each query costs more.
What this demo shows
- Seed
(rag-bot, claude-opus)budget at minute=10, day=100. - Agent makes a query → estimated cost computed from response bytes.
- Repeat until daily budget is exhausted (~100 queries).
- Next query → blocked with retry-in-N-seconds message.
Run it
cd demos/v0.4.0/13-token-budget./demo.shInteractive script that:
- Loads
ai-classifier.wasm+token-budget.wasminto the proxy. - Connects with
application_name=rag-bot-claude-opus-4-7soai-classifiertagsagent_id=rag-bot...+model_id=claude-opus. - Runs queries in a loop, prints the running cost.
- Stops at the budget block.
Implementation pointer
HDB-HeliosDB-Proxy-Plugins/token-budget/src/lib.rs. Cost model:
tokens ≈ response_bytes / 4. Budget windows: minute + day. Day
overrides minute (same precedence as cost-governor).
HeliosDB compatibility
Backend-agnostic.