The possible decline in format response and reasoning capabilities of gpt-3.5 series APIs

swjtuy9 · April 12, 2024, 1:42am

GPT-3.5 series APIs include GPT-3.5-turbo, GPT-3.5-turbo-1106, and GPT-3.5-turbo-0125. Their ability to respond to format requirements and reason appears to be reduced. When I conducted experiments under the same benchmark, I could achieve an accuracy of 0.87 on April 10th and before, but today, April 12th, it was only about 0.82. Is this an adjustment made by OPENAI?

_j · April 12, 2024, 1:59am

The system_fingerprint in an API return is supposed to be updated when OpenAI makes changes to the newer models which might affect determinism.

Have you recorded that from prior inputs?

Stealth modification of AI models is a modus operandi.

OpenAI should find and put back gpt-3.5-turbo-0613 exactly as it was in June – before they did multiple rounds of training and platform modification that broke system instruction following again and again. A “pro developer” edition that is not a “sorry, you only get ChatGPT now” AI.

swjtuy9 · April 12, 2024, 2:08am

I am a developer at the beginning stage. I just wrote a chain using langchain, and then used langchain’s output parser as the format instruction. I have no idea about system_fingerprint and have never documented it. I just intuitively feel the performance degradation of these APIs. I hope that OPENAI can maintain the stability of a model’s performance, because reproducible results are needed in many places.

swjtuy9 · April 12, 2024, 3:42am

I found claude-3-haidu to be better than gpt-3.5-turbo-0125 in format responsiveness and inference capabilities, so I will switch to using claude-3 for now

gautam.sabba1 · May 16, 2024, 10:03pm

I’m seeing a dramatic degradation of reasoning ability too. It seems to have started around about the time of the Omni launch. This was working last week.

Topic		Replies	Views
Has the reasoning ability of the GPT 3.5 API dropped recently? API chatgpt , api	9	1071	December 25, 2023
GPT-3.5-turbo-1106 is worse than 0613 version API gpt-35-turbo , api	15	7174	April 25, 2024
Major Issues in new GPT 3.5 : DO NOT DEPRECATE OLD ONE API gpt-35-turbo , api	24	7592	January 21, 2024
Gpt-3.5-turbo-0125 worst than gpt-3.5-turbo-0613 Feedback	2	2910	February 27, 2024
Huge quality drop in gpt-4-turbo Bugs	13	1116	May 30, 2024

The possible decline in format response and reasoning capabilities of gpt-3.5 series APIs

Related topics