Running into something I can’t find a clean answer to.
The Agents SDK gives great observability — you can trace every call,
log every token. But logging isn’t enforcement. If an agent loops or
retries aggressively, the cost accumulates in real time and the trace
shows you what happened after the fact.
I’ve been looking at a few approaches:
- Wrap each tool call with a pre-check that queries a spend ledger
before execution — hard stop if the policy would be breached
- Custom Runner subclass that intercepts before model calls
- External policy engine the agent calls as a tool
Option 3 is what I’ve been building — treating budget as a tool the
agent calls before any paid operation. The agent asks “can I spend
$0.05?” and gets approved/denied before the call fires.
Has anyone else approached this differently? Curious whether the
community has patterns for hard enforcement (not just monitoring)
at the agent level.
There are a few things you could try, but this seems like a complex case that may need further research.
I’d consider using an object for usage tracking, passed to the agents as context. This might help you control the flow.
For hard enforcement, there are hooks. While they are called in real time, they don’t have a graceful mechanism to stop the flow.
There is also a max_turns parameter that you can use to avoid infinite calls, which could lead to major budget issues.
Hey mate I think Option 3 is the right instinct budget as a pre-execution gate, not a post-hoc trace. The tricky bits we ran into building this: dealing with the spend ledger atomically so concurrent calls don’t step past the cap, and keeping the policy engine fast enough that the precheck doesn’t add actual latency. Early on, decide whether the agent asks permission or the gate sits outside the agent loop entirely (non-bypassable)-the second is safer, since a misbehaving agent can’t skip its own check.
we’ve been building as a separate layer. would be happy to share notes