Best Model for Agent Triage (GPT-5 mini/nano) and Passing Messages/Context

Wondering if anyone has any experience with finding out the best, super quick smartest agent for doing the initial “triage” agent who then handoffs to others?

In the past I was using gpt-4-mini and felt pretty good about it - now trying out gpt-5-mini, but with options like reasoning, verbosity, etc. - wondering if there is some best practices to follow.

Also, I’m noticing with 5-mini its better at passing forward messages and context in a transfer_to_agent_name function–but it’s sometimes passing the entire user message in either message or context (even after trying to avoid it in the instructions for that transfer).

As I’m using a homegrown agents “SDK” (not the official ones) - is it normal for this to happen - in the Agents SDK do they have this message/context filled up as well as encouraging previous_response_id in there as well? Or is it a thing that once you hand-off you drop the previous response?

thanks in advance!

1 Like

I would use a structured output for capturing a guardrail or disposition.

A function is optional, by AI model choice, but this is a mandatory job.

The function schema placement communicates more clearly and doesn’t rely on post-training of what the function tool format given to the AI means.

I don’t understand the “passing messages” you are attempting. You could make a strict structured output that only accepts a number property in an array, and have the AI give “index numbers of important chat related to the latest question” based on your shown message number (as an example of an application paying more than embeddings.)


Right now is a pretty bad time to figure out what model can do a permanent “super quick”, but you can start evaluating “smartest” (where I can give you an expensive answer).

Performance of small in to small out

Model Trials Avg Stream Latency (s) Avg Rate (tokens/s)
gpt-4.1-mini 10 0.890 7.386
gpt-5-nano 10 1.166 2.138
gpt-4o-mini 10 1.052 6.553
gpt-5-mini 10 1.211 5.671

Unique responses for gpt-4.1-mini (by first 60 chars):

10 | The capital of France is Paris.

Unique responses for gpt-5-nano (by first 60 chars):

9 | Paris.
1 | Paris is the capital of France.

Unique responses for gpt-4o-mini (by first 60 chars):

10 | The capital of France is Paris.

Unique responses for gpt-5-mini (by first 60 chars):

10 | The capital of France is Paris.

1 Like

In my efforts to dupe the Agents SDK, I followed their strategy of creating a function called transfer_to_name_agent that was an object with both message and context in it.

Prior to GPT-5 these two works be kind of summarized, but now they are quite chock full of tokens!