Huge quality drop in gpt-4-turbo

Last week, I ran gpt-4-turbo on my task (>1000 inferences) and it worked great. Today, I am trying the exact same prompts and it’s performing much worse. For example, it’s now consistently failing to write output in the expected format even though I included 5 exemplars for the expected format.

Has something changed with the gpt-4-turbo api recently? FWIW I am using gpt-4-turbo instead of gpt-4o because regular 4 worked better for my task when I compared the output last week.


Usually if there are any changes made to the underlying model, the system fingerprint changes.

Running a few of your prompts with the same configuration to see if it’s a one-time issue.

I am also having this issue. It was working fine last week.

We are experiencing the same situation. Migrated from GPT4-turbo to Omni and then back after a few days. We’re using Chat API to generate JSON, and the number of errors in JSON formatting has dramatically increased (not to mention issues with the content itself). This change is evident in our logs; something has definitely worsened.

I have found that of all formats the markdown table format is most consistent across new models. Hence if possible i would suggest try adopting your code to the markdown table format.

Most probably all quality has gone over to gpt4o… :sweat_smile:

Could you explain a little bit deeper? How many times did you check the “quality” last week and what exactly has changed since then?

This topic concerns the API not ChatGPT?

I am using the API. That’s why I included the tag

It’s a simple instruction. Just needs to format the response in ‘finalAnswer’ tags, but now I’m getting inconsistent tags like ‘finalFoo’, ‘finalBar’, etc. Wasn’t a problem last week when I ran 1000s of inferences.


You need a solution for that or are you just angry and want to express that?

If it is a simple thing. Would you mind posting the prompt or send it to me in a private message - I swear I don’t need it.

We now downgraded to gpt-4-0125-preview. The results are consistently better than with gpt-4-turbo-2024-04-09 and gpt-4o.

The quality is similar as is used to be with gpt-4-0125-preview few weeks ago.


