My understanding is that “gpt-4o-2024-05-13” is the date-specific version currently used by the “gpt-4o” model specification as well (so they are the same model). I did try it and saw one working run, followed by more broken runs.
@brianz-oai Here is a LangSmith trace of a run that clearly fails to pick up on the given schema. I’m not allowed to post a link, but the trace is available at the langsmith website (smith langchain com) under /public/40e38079-fc4a-4da6-b6cd-e36b2869380d/r
Here’s the schema of the function tool provided:
{
"name": "Answer",
"description": "Response to the question, containing 3 keys: answer, reflection, search_queries",
"parameters": {
"type": "object",
"properties": {
"answer": {
"description": "~250 word detailed answer to the question",
"type": "string"
},
"reflection": {
"description": "Your reflection/critiques of your answer",
"allOf": [
{
"title": "Reflection",
"type": "object",
"properties": {
"missing": {
"title": "Missing",
"description": "Critique of what is missing",
"type": "string"
},
"superfluous": {
"title": "Superfluous",
"description": "Critique of what is superfluous",
"type": "string"
}
},
"required": [
"missing",
"superfluous"
]
}
]
},
"search_queries": {
"description": "The final top level key containing 1-3 search queries to use for researching improvements to address the critiques of your answer",
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"answer",
"reflection",
"search_queries"
]
}
}
And here are the messages:
[
{
"content": "You are an expert online researcher\n Current time: 2024-06-11T09:08:54.991070\n\n Here are your instructions:\n 1. Provide a detailed ~250 word answer.\n 2. Reflect and Critique your answer to step 1. Be severe to maximise improvement.\n 3. Recommend search queries to research information that will assist in improving your answer.\n ",
"type": "system"
},{
"content": "Who is the most popular musician in the United States right now?",
"type": "human"
},{
"content": "Answer the user's question above using the required format.",
"type": "system"
}
]
gpt-4o
is nesting “search_queries” under “reflection” key, when it’s supposed to be a top-level key. I realize this is kind of a nuanced schema, but gpt-4o
pretty consistently produces the same error (90% of the time), regardless of how I try to phrase the field descriptions, etc. And gpt-4-turbo
consistently produces a correct result.