Gpt-3.5-turbo-0125 worst than gpt-3.5-turbo-0613

I clearly notice that gpt-3.5-turbo-0125 is worst than gpt-3.5-turbo-0613 (and previous ones), I do notice that in many situations, it seems to output the less possible. Two examples:

  • If I send a prompt like “Hello, I’m Nuno, Please let me know (something…)” in 0613 (and it was like that in all previous 3.5 versions) or in gpt-4 turbo (last version) it replies something like “Hello Nuno, (reply…)”, however since 0125 it doesn’t greet me anymore and outputs just the plain reply as short as possible, no Hello, tried it many times and always this.

  • Another example is that if I just ask it on a prompt to reply back, with slight text variations, a bit of text (with around 70 words) the output used to (on 0613 or gpt4) include the whole ideas of the text with slight variations on the way it is written… now on 0125 it outputs half of the ideas expressed on the text, some just get missed, again it seems to reply the less possible.

What is happening with gpt-3.5? getting cheaper with updates but also getting stupid, lazy, … ?

I noticed the same. For a classification task, and a task of extracting filters, switching from 0613 to 0125 reduced performance significantly. Also, compared to 0613, there is much more variation in output to same input even at temperature 0.

I tried changes to the prompts as well, but it is much harder to get it to work with 0125 than with 0613

2 Likes

Hopefully OpenAI realizes that useless cheap context length is only for the most naive “how can I chat with my pdf” user. That one-third the cost for one-third the usefulness is undesired.

What should be done for developers who want good AI above all else is to find a version of gpt-3.5-turbo-0613 that was in production in June before they started damaging it further with anti-instruction-following, and reproduce that platform completely on the API as gpt-3.5-turbo-developer.

No janky multi-tool methods with undesired text injection the model can’t understand anyway, which aim to limit what you can send to the AI.

4 Likes