Demo 13 — `token-budget` plugin

Module brief: §Module 13

UVP

Per-(agent, model) cost ceiling for AI traffic. Estimated from response bytes (≈ 4 bytes per token). Stops a runaway agent before the AWS bill hits.

Use cases

LLM agent cost control. Each agent ID gets a daily token budget; runaway loops stop at the budget instead of running unbounded.
Per-model billing throttle. Different cost ceilings per model — claude-opus has a higher budget than gpt-3.5 because each query costs more.

What this demo shows

Seed (rag-bot, claude-opus) budget at minute=10, day=100.
Agent makes a query → estimated cost computed from response bytes.
Repeat until daily budget is exhausted (~100 queries).
Next query → blocked with retry-in-N-seconds message.

Run it

cd demos/v0.4.0/13-token-budget
./demo.sh

Interactive script that:

Loads ai-classifier.wasm + token-budget.wasm into the proxy.
Connects with application_name=rag-bot-claude-opus-4-7 so ai-classifier tags agent_id=rag-bot... + model_id=claude-opus.
Runs queries in a loop, prints the running cost.
Stops at the budget block.

Implementation pointer

HDB-HeliosDB-Proxy-Plugins/token-budget/src/lib.rs. Cost model: tokens ≈ response_bytes / 4. Budget windows: minute + day. Day overrides minute (same precedence as cost-governor).

HeliosDB compatibility

Backend-agnostic.

Demo 13 — `token-budget` plugin

Demo 13 — token-budget plugin

UVP

Use cases

What this demo shows

Run it

Implementation pointer

HeliosDB compatibility

Demo 13 — `token-budget` plugin