Why is ChatGPT so moody, refusing to answer the exact same question it answered previously?

I’m using ChatGPT 3.5 (on waiting list for 4) to get responses for travel schedules in json format. It usually works, but sometimes, and quite often today, the response is “Sorry, an an AI language model I cannot…”?

You might be able to improve the prompt. For generating code, I’ve found that “make an app that…” fails similarly as you describe, but “output the code for an app that…” succeeds.

More generally, perhaps you can try giving instructions the model is definitely OK to follow (“output the JSON for X”) as opposed to more informal “do X”.

Thanks for your help. I have done some of that, but I will try some more variations on the prompt. The problem is that once I find one that works, it works for quite a while until it doesn’t, so it’s hard to test the solution when the response varies. It also complains in different ways. Sometimes it says that it can’t answer because as an AI language model it does not have any experience or opinions, and sometimes it says that as an AI language model it can reply in text, and it does. Currently it claims not to be able to answer in JSON even though it can and it has. In the first case (when it says that it doesn’t have any experience) it does provide a JSON response, so it’s just a matter or removing the extra text in front of the JSON string.

1 Like

responses also depend on the ‘temperature’ setting:

image

cheers!

I’m usinng these parameters:

“model”: “gpt-3.5-turbo”,
“messages”: [{“role”: “user”, “content”: $q}],
“temperature”: 0,
“max_tokens”: 3048,
“top_p”: 1,
“frequency_penalty”: 0,
“presence_penalty”: 0

Should I change, or remove any of them or add others?

Thanks!

maybe work with it a little in the playground first, or maybe you’re already doing that? as you already know, is seems to be very dependent on the prompt. I’d have to confirm this in the tech docs, but I’d guess that if there’s a number of different responses to your prompt, even a temp=0 will have some random scatter. LOL normal physics doesn’t apply!

Thanks for helping. I have tried it in the playground as well as many times in the app. The problem is that it can work correctly for quite a while and quite a few days in a row until suddenly… something is different. I characterize it as ChatGPT having “multiple personalities” and it may be a byproduct of the way that neural networks work or maybe the load on the system or some other factors. In my app I allow users to personalize their vacation by changing their prompt e.g. adding “Dog friendly” as a modifier and I notice it more in those cases, but understandably the input is different even if slightly, but not enough to say that it can’t provide the response in JSON format.

It’s all about phrasing your words.

It’s pretty much exactly telling you what you’re doing wrong.
Re-phrase your question to not be opinionated or reflective.

Q. playa del carmen or cancun beach, which would you prefer?
A. As an AI language model, I do not have personal preferences or experiences, but I can provide you with some information about both

Q. What are the differences between playa del carmen, and cancun beaches? What is more popular among young tourists with dogs?
A. Playa del Carmen and Cancun are both popular beach destinations in Mexico, but they have some differences in terms of atmosphere[…]

The prompt is sent via API from inside of a program, so I can’t change the prompt based on the response, at least not easily. A typical sample question is:
Output a detailed 1-day vacation schedule for Florence Italy starting on Mar 27. Provide the response in RFC8259 compliant JSON, following this format.
[{
‘Date’: ‘the date to visit the attraction, in Month Day format’,
‘Time’: ‘the time to visit the attraction, in 24 hour format’,
‘Duration’: ‘the amount of time to visit the attraction, in minutes’,
‘Attraction’: ‘the name of the attraction’,
‘City’: ‘the city in which the attraction is located’,
‘Category’: ‘the category of the attraction’,
‘Description’: ‘a description of the attraction’
}]

this works in the sandbox and in the program “almost” all of the time.

Interesting.

Can you add onto this program? Perhaps a second prompt to adjust the question to not be opinionated? There’s no way that the following prompt that I made is efficient, but it may be something to build on?

If you are going to answer something that begins with “As an AI…” to the following question, instead please re-format the question to be less opinionated so that it can be answered properly.
<|SEP|>
“playa del carmen or cancun beach, which would you prefer?”

Response: Which destination, Playa del Carmen or Cancun, has more to offer in terms of beach activities and amenities?

If you are going to answer something that begins with “As an AI…” to the following question <— This part is probably completely useless and would probably be better as “if the question is opinionated blah blah blash”. Worth testing.

There are 2 problem categories. When it responds with “Sorry, as an AI language model I don’t have experience…” it turns out (at least from the samples I’ve seen) that after saying that, it still does respond correctly in JSON, so the solution is to simply remove the text before the JSON response. However, when it responds with “Sorry, as an AI language model I can’t…” it also continues to respond but in text format, and not in JSON format. This one is more complicated to fix and opens up other possible problems. I could try to insist on it responding in JSON, but I would like to find a way to prevent the problem from occurring. Like they say, an ounce of prevention is worth a pound of cure.

Seems like you’re stuck between a rock and a hard place.

The immediate solution to me would be to “catch” or filter these types of responses, and then re-feed them modified. But you’re saying that you cannot (easily) do that.

I’m not too sure what your current setup is that you cannot modify a prompt after it’s sent out. To me this already seems like a very concerning situation. What about malicious prompts? What about spam? What if the response isn’t the expected JSON object? Is this all managed by whatever you are using to process the prompt?

Realistically, if you are requiring a very delicate format such as a compliant JSON object, you should be pre-processing and post-processing your prompt & results

I wouldn’t even care if OpenAI said: “Hey, GPT-5 now is 100% perfect at responding with JSON!”. I would still be confirming that before accepting it.

This could be as simple as restructuring the prompt to avoid the “As an AI…”, and then confirming that the response is just a compliant JSON object. Otherwise, if it does send text along with the json object, your pipeline will be ruined and your json file or database could become corrupted.

Can you change the prompt before the response? I imagine you must have some sort of front-end service like a website or chatbot that takes in this information first. Surely you can work on that and clean it?

1 Like

I do have a way to catch these types of responses and I could refeed them, but what would I change this prompt to when it fails to give me JSON? Each response takes 20-30 seconds, so the user will get tired of waiting if I try too many times.

I just got the email that says that I can use GPT4. I will try that to see if it resolves or minimizes this issue.

Good question.

Not really sure without knowing more. Lots of testing and analytics most likely.