GPT3.5 Turbo downgraded suddenly?

I am noticing a significant decline in natural language understanding capabilities of gpt-3.5-turbo API recently. It is now failing to understand and execute the instructions given in the prompt. Earlier I got good enough accuracy but now the answers are quite poor.

Another strange error I am noticing is that the model randomly hallucinates if API is invoked more than 40 times in succession. Even adding the instruction “Restrict to the input text provided and do not add any outside knowledge” doesn’t work.

I have tried to use gpt-3.5-turbo-0613 specifically and yet the above variations and issues keep occurring.

Any suggestions on what is happening and how to rectify this are appreciated.

Thanks

if you can provide your old prompt responses and the new ones so a comparison can be made that would be great.

Do you have an example of the 40 times in succession issue?

gpt-3.5-turbo == gpt-3.5-turbo-0613

Until December 11, then

gpt-3.5-turbo == gpt-3.5-turbo-1106

if plans stay in place.

edit: quality of answer: true at the time. The models page has since been updated to remove the transition date, likely due to continued issues with multi-language function-calling

Yes, the AI has been hit and hit again with quality degradation, especially for following an API user’s system programming. gpt-3.5-turbo-0301, which dramatically demonstrated the difference, also was recently slapped with changes.

Then they say “these are snapshots” when they are not. The only way a corporate-type can call their newest AI “improved” is to gaslight those who would run benchmarks against the current offerings instead of what they obtained earlier.

edit: quality of answer: true. I have been impacted by overnight changes to the same model name which broke previous functionality. I have been told by an OpenAI staff member here that they are “snapshots”, and then presentation of further evidence ended that communication.

You can try gpt-3.5-turbo-1106, that’s the only other function call model, unless you follow the desired course of upgrading the rate your account is emptied by 15x by going to GPT-4, their desired path.

edit: quality of answer: true + personal opinion. The difference of price one can read on the pricing page, and gpt-3.5 models have been reduced in quality to where the only serviceable option in many cases is to change to gpt-4.

2 Likes

Another technique I have yet to give a name or paper for is “auto corrective iteration” or something like that.

Take the first answer, prompt the AI "[our system has automatically flagged the provided answer as unsatisfactory or deviating from the parameters or system programming instruction. Paying higher attention, provide a new response directly to the user of higher quality. Do not refer to this iteration or the correction, only provide the high-quality response to the user.]

Regardless of what it writes the first time.

Thanks for suggesting this.

I assumed you meant I should append original prompt and answer with this message and then get the response back. Unfortunately, the 2nd response also was a wrong output.

I tried the same in the ChatGPT UI (3.5 version) also. There also it made these mistakes and when I ask it to explain its answer, it starts with “I apologize for the oversight…”. However, the same does not work well in API as the 2nd output is quite dynamic and doesn’t follow the structure I want it to.

I tried including a Quality Check condition in the initial prompt itself where I asked it to review its generated output before returning the output but that doesn’t work either and it continues to make the same mistake over and over.

Its really getting annoying !

Its hard to give examples verbatim. However the prompt I am trying is to extract certain entity names from the input text. It works well initially but after certain iterations, say 40 (not sure of the exact count), it starts giving made-up entity names which are not in the text. I have explicitly instructed in the prompt to not use any outside knowledge and to restrict to the input only. Yet these strange issues are happening.

Ok, so, anytime you start getting made up results from the model it is almost always down to missing context, somehow or other I think your context is getting lost by the 40th attempt.

Put in some debug and log the exact prompt that is getting sent each time, I think you’ll fine something unplanned happening to the data.