Why does hallucination increased after the fixes made on 16th Oct

For a given use-case of mine, I had formulated a prompt which also used function calling in it. This prompt was working perfectly fine until the 16th Oct night. After the fix which was made by OpenAI due to the elevated error rate, I have noticed that the exact same prompt has stopped working. Hallucinations have increased drastically.

Is anyone aware of the reason for this? And is there any other possible solution for this?

  1. what “fix”?
  2. what model?
  3. what prompt?

The answer is basically “there is only one function-calling model, and we will hit it with whatever quality degradation we feel like, without comment. Good luck with that.”

I am using gpt3.5-turbo-16k-0613 model.
For reference, I am attaching the prompt:-


The customer is interested in buying the iPhone 14 phone, now you need to perform the below steps one at a time:
1. You need to state the benefits of the Apple Care Plus product which are mentioned in triple backticks.
2. Ask the customer if they are interested in buying the Apple Care Plus product along with iPhone 14.

Apple Care Plus Benefits:
    Extended warranty
    Door-to-door service
    Priority Service

Always ensure that if the customer is interested in buying Apple Care Plus then you must call the interested_in_applecare_decision function.
Always ensure that if the customer is not interested in buying Apple Care Plus then you must call the not_interested_in_applecare_decision function.
Always ensure that if the customer is not interested in buying the iPhone 14 phone then you must call the not_interested_buying_iphone_decision function.

This was working fine but after the fix it is not, why is this happening and how can be fixed

First, there is no reason to pay double if you don’t need the context length of -16k. If it is really possible for you to exceed -4k with your application, you can make a more intelligent model-select mechanism in your software.

You should find a way to minimize the system prompt. The functions should have a clear description and names so they can stand alone on their own merit.

gpt-3.5 models have been hit with quality degradation for following system instructions going back a month. You cannot fix this. (They broke GPT-4, and thus had to go after gpt-3.5 that still worked and embarrassed the 30x more expensive model?)

Thanks, yes in our use case it is possible to exceed 4k.

Okay will minimise the prompt but why suddenly does this stop working?

It stops working because OpenAI continues to integrate new fine-tuning into models and continues to “optimize” them to give them the minimum appearance of operating properly with minimum computation.

Okay, So If one prompt is working correctly now and if there are any new fixes made to the model then there are high chance that the prompt that was working before should stop working now.

Yes, you need to make your system message, functions, and user message encapsulation as clear, simple and distinct as possible, so that it is robust against any further model tuning, and not already running at the edge of AI ability to understand what’s going on.

You can also get more reliable operations by including an API parameter such as top_p = 0.3 (or lower) so you are only generating the most likely path of language completion, to avoid unexpected tokens even when model perplexity increases.

Thanks for the help will try with top_p = 0.3