some background: we’re building agents using gpt-3.5-turbo.
And we’re also seeing a drastic decrease in its intelligence with the new version, major problems:
hallucination increase:
Using the same prompt, the new one is worse at:
-
Information extraction: it’ll get wrong result from context.
Eg: I want it to find a specific SKU from a list of SKUs, it’ll give me the right name, properties, but will give me the wrong id.
-
Format stability: it’ll more often break our format restrain, no matter if we use function_call or not.
-
Worse logic
Still with SKU search example: It’ll some times give me a matching SKU, (both in its reasoning and final structured result), which doesn’t match anyone in provided list.
Speed decrease
We’re using both OpenAI and Azure. And there’s also a major performance different between the two models, something like 50% in RT.
we’re forced to decrease temperature and providing more examples, but still struggling to port all of our agents to the new model…