I am noticing a significant decline in natural language understanding capabilities of gpt-3.5-turbo API recently. It is now failing to understand and execute the instructions given in the prompt. Earlier I got good enough accuracy but now the answers are quite poor.
Another strange error I am noticing is that the model randomly hallucinates if API is invoked more than 40 times in succession. Even adding the instruction “Restrict to the input text provided and do not add any outside knowledge” doesn’t work.
I have tried to use gpt-3.5-turbo-0613 specifically and yet the above variations and issues keep occurring.
Any suggestions on what is happening and how to rectify this are appreciated.
Yes, the AI has been hit and hit again with quality degradation, especially for following an API user’s system programming. gpt-3.5-turbo-0301, which dramatically demonstrated the difference, also was recently slapped with changes.
Then they say “these are snapshots” when they are not. The only way a corporate-type can call their newest AI “improved” is to gaslight those who would run benchmarks against the current offerings instead of what they obtained earlier.
You can try gpt-3.5-turbo-1106, that’s the only other function call model, unless you follow the desired course of upgrading the rate your account is emptied by 15x by going to GPT-4, their desired path.
Another technique I have yet to give a name or paper for is “auto corrective iteration” or something like that.
Take the first answer, prompt the AI "[our system has automatically flagged the provided answer as unsatisfactory or deviating from the parameters or system programming instruction. Paying higher attention, provide a new response directly to the user of higher quality. Do not refer to this iteration or the correction, only provide the high-quality response to the user.]
I assumed you meant I should append original prompt and answer with this message and then get the response back. Unfortunately, the 2nd response also was a wrong output.
I tried the same in the ChatGPT UI (3.5 version) also. There also it made these mistakes and when I ask it to explain its answer, it starts with “I apologize for the oversight…”. However, the same does not work well in API as the 2nd output is quite dynamic and doesn’t follow the structure I want it to.
I tried including a Quality Check condition in the initial prompt itself where I asked it to review its generated output before returning the output but that doesn’t work either and it continues to make the same mistake over and over.
Its hard to give examples verbatim. However the prompt I am trying is to extract certain entity names from the input text. It works well initially but after certain iterations, say 40 (not sure of the exact count), it starts giving made-up entity names which are not in the text. I have explicitly instructed in the prompt to not use any outside knowledge and to restrict to the input only. Yet these strange issues are happening.