Optimizing Agentic Architecture: Strategies for Reducing High Token Costs in Multi-Intent Workflows

Hello everyone,

I’m currently building an ERP assistant using the OpenAI Agentic SDK, and I’m trying to optimize cost as much as possible before scaling to many users.

Current Architecture

I implemented a custom orchestration layer with:

  • Intent classification step (using a lightweight model)

  • Dynamic model routing based on intent

  • Dynamic tool loading depending on the detected intent

  • Session management using OpenAIConversationsSession

Flow:

  1. User message

  2. -> Intent classification (gpt-4o-mini)

  3. -> Route:

    • model (gpt-4o-mini, gpt-5.x-mini, gpt-4o)

    • tools (semantic search, SQL, actions, PDF, Google Drive, etc.)

  4. -> Run agent with selected tools + prompt

  5. -> Return structured ERP response

Optimization Strategies Already Implemented

  • Using smaller models (gpt-4o-mini) for simple queries

  • Restricting tool availability per intent

  • Custom prompts per intent (to reduce unnecessary reasoning)

  • Session reuse with overflow protection

  • Strict scope enforcement (ERP-only assistant)

  • Limiting max tokens in classification step

Problem

Despite all of this, I’m still seeing relatively high cost:

  • ~20 requests ≈ $0.5 – $1

  • This feels too high for my use case, especially at scale

My Questions

  1. Is this expected with the Agent SDK?

    • Does the SDK internally add hidden token overhead (tool calls, system prompts, etc.)?
  2. Is intent classification doubling my cost unnecessarily?

    • Would it be better to:

      • merge classification into the main agent?

      • or use a rule-based / embedding-based router?

  3. Are tools increasing token usage significantly?

    • Even when not heavily used?
  4. Would switching to a single-model strategy be more efficient?

    • Instead of routing between multiple models
  5. Is there a better pattern than “agent per request”?

    • e.g., long-lived agents, cached context, or hybrid pipelines

Goal

I want to reach something closer to:

  • $0.05 – $0.10 per 20 requests (or similar efficiency)

before scaling to production with many users.

If anyone has experience optimizing Agent SDK cost at scale, I’d really appreciate guidance, patterns, or even architecture feedback.

Thanks a lot :folded_hands: