Optimizing Agentic Architecture: Strategies for Reducing High Token Costs in Multi-Intent Workflows

Backend-Developer · April 16, 2026, 3:52pm

Hello everyone,

I’m currently building an ERP assistant using the OpenAI Agentic SDK, and I’m trying to optimize cost as much as possible before scaling to many users.

Current Architecture

I implemented a custom orchestration layer with:

Intent classification step (using a lightweight model)
Dynamic model routing based on intent
Dynamic tool loading depending on the detected intent
Session management using OpenAIConversationsSession

Flow:

User message
-> Intent classification (gpt-4o-mini)
-> Route:
- model (gpt-4o-mini, gpt-5.x-mini, gpt-4o)
- tools (semantic search, SQL, actions, PDF, Google Drive, etc.)
-> Run agent with selected tools + prompt
-> Return structured ERP response

Optimization Strategies Already Implemented

Using smaller models (gpt-4o-mini) for simple queries
Restricting tool availability per intent
Custom prompts per intent (to reduce unnecessary reasoning)
Session reuse with overflow protection
Strict scope enforcement (ERP-only assistant)
Limiting max tokens in classification step

Problem

Despite all of this, I’m still seeing relatively high cost:

~20 requests ≈ $0.5 – $1
This feels too high for my use case, especially at scale

My Questions

Is this expected with the Agent SDK?
- Does the SDK internally add hidden token overhead (tool calls, system prompts, etc.)?
Is intent classification doubling my cost unnecessarily?
- Would it be better to:
  - merge classification into the main agent?
  - or use a rule-based / embedding-based router?
Are tools increasing token usage significantly?
- Even when not heavily used?
Would switching to a single-model strategy be more efficient?
- Instead of routing between multiple models
Is there a better pattern than “agent per request”?
- e.g., long-lived agents, cached context, or hybrid pipelines

Goal

I want to reach something closer to:

$0.05 – $0.10 per 20 requests (or similar efficiency)

before scaling to production with many users.

If anyone has experience optimizing Agent SDK cost at scale, I’d really appreciate guidance, patterns, or even architecture feedback.

Thanks a lot

Topic		Replies	Views
How Are You Managing OpenAI API Costs in Large-Scale Apps? API api-costs	1	127	December 28, 2025
How to improvement my app to use less tokens Community gpt-4 , api	5	12764	October 17, 2025
Custom Java System vs Assistants API—Seeking Advice on Dynamic AI Agents, Training, and Token Efficiency API gpt-4 , chatgpt , fine-tuning , api , assistants-api	2	180	December 28, 2025
Managing Costs with GPT-4o and Assistants API in a Growing Context: Seeking Advice API api , assistants-api	6	1008	July 10, 2024
The "Zombie Loop" Problem: How are you managing ROI as agents get more autonomous? Community chatgpt	0	75	January 24, 2026