GPT4 (after the 06-13 model) is less smart

marketing5 · June 30, 2023, 12:15pm

I’ve noticed a significant downgrade. I’ve built a translation tool. When translating the same text from English to Dutch, I’ve observed some strange errors that GPT4-0314 doesn’t make at all. GPT4 (0613 and now the default model) appears to be more on par with 3.5-turbo, which is a massive downgrade.

We also notice it with more advanced coding questions. The results are much less usable, there are more errors, and it seems that GPT4 doesn’t understand at the level the 0314 model does.

This is not some minor downgrade, sometimes it misinterprets the entire context of a question. Hopefully things will improve, for now I’ll just use the 0314 model, which is unfortunate, since I did like the speed-improvements.

Foxalabs · June 30, 2023, 12:17pm

Hi, Marketing5

Can you post some examples of the prompts and results that are not to your expectations, if anything is translated could you provide a reason why the new one is bad and the old correct version for the non Dutch speakers, on the code side, just the code will be fine, thank you.

marketing5 · June 30, 2023, 12:45pm

For example, I translated a blog, where this English sentence:
Please check my new Podcast!

Was translated into:
Controleer alstublieft mijn nieuwe Podcast!

The entire context is wrong. “Controleer” would be used if something needs be checked / inspected.

GPT4-0314 translated it to:
Bekijk alsjeblieft mijn nieuwe podcast!

This is a much better interpretation, although not perfect (it means ‘view my podcast’). After a second try with GPT4-0314 it changed it to “Luister naar mijn podcast”, which nailed it.

As of coding, it’s difficult to give a concrete example, since I send a lot of huge codeblocks…
What will happen though, is that the GPT4-0613 model and above seems to take ‘shortcuts’. It will give you only a portion of the code, it’s less complete and it seem to forget a lot of things which the old GPT4 simply doesn’t forget. I tried it side-by-side, and often change to the 0413 because I get frustrated. The 0413 model will give good results almost all the time.

A colleague of mine also had some issues while asking for some advanced changes in a HTML table regarding a RFM lifecycle model (numbers needed to be applied to certain scenarios). The new GPT4 just said: This is to advanced for me. And on the second try it just gave up, or gave some wrong answers. The GPT-0413 model almost completed the task without errors.

Foxalabs · June 30, 2023, 12:51pm

The translation info is certainly interesting, thanks for that.

To the coding part, the new model obeys the system message quite a bit more now, if you are performing coding tasks I find it beneficial to create a coding persona and place that in the system message. Along the lines of “You are an expert computer programmer that specialises {lang} you always produce code to the highest standards and use all industry best practices, when asked for code you produce entire blocks of complete accurate…” etc. etc.

This will set the model with the intent to produce code from an experts perspective and not that of a “helpful assistant”

anon22939549 · June 30, 2023, 3:39pm

To be fair, this is incorrect English as well.

Depending on additional context it could be understood as,

Please check [out] my new podcast!
Please check my new podcast [for errors]!

And while the first it’s more likely to be understood in a colloquial sense, the second is closer to a plain reading of the text.

There may also be a bit of randomness at play here. I’d be curious to see if you were to run this prompt 25 times, how often it goes each way.

Edit: I went ahead and did 10 runs with your prompt and with corrected English.

Please check my new podcast!

Please check out my new podcast!

As you can see, the default model assumes you want someone to view your new podcast only 20% of the time in this sample. Whereas when using proper English, the default model has zero difficulty constructing a proper translation.

Incidentally, it seems even the default gpt-3.5-turbo has no issue doing the translation when phrased properly in English (though I don’t read Dutch and can’t independently verify this).

GPT-3.5-Turbo: Please check out my new podcast!

jwatte · June 30, 2023, 9:30pm

That is exactly what the English sentence means, too. You want to use Check out in English in this context.

jochenschultz · July 1, 2023, 12:04am

What do you mean? From my developers point of view I understand that you want me to check out your new podcast from git.

qrdl · July 1, 2023, 1:30am

If this sort of thing is a real concern for you, your best bet is to do a set of evaluation prompts and save them (right click save as) and then try them again at a later point.

I did this a few months ago. I have yet to see failure, and I reword the prompts to make sure it’s not a caching thing.

And I have no problem calling out OpenAI on anything and everything, so you can be assured I’m not sucking up.

For the above, in english we don’t often say ‘please’ when marketing, though we might say it if we were asking someone to do some editing/verification. The “!” after it is weird, though.

I agree with others, it’s not a good example of failure.

KnowingStop · July 9, 2023, 9:17am

I concur with your viewpoint. I have furnished two supporting pieces of information at Experiencing Decreased Performance with ChatGPT-4 - #157 by KnowingStop.

Topic		Replies	Views
Major Issues in new GPT 3.5 : DO NOT DEPRECATE OLD ONE API gpt-35-turbo , api	24	7436	January 21, 2024
GPT-4 becoming dumber sometimes, for a while API	7	2426	December 18, 2023
Chat GPT4 1106 vs ChatGPT 4: Impressive drop in quality API gpt-4 , chatgpt	27	15192	February 14, 2024
Chat GPT 4 getting worse? API	8	4704	December 17, 2023
I'm going to be honest here: since the release of the GPT-4o updates, ChatGPT has been getting more and more problematic. My GPT responses are not what they used to be Plugins / Actions builders gpt-4 , plugin-development	23	1917	October 25, 2024

GPT4 (after the 06-13 model) is less smart

Related Topics