I hav3 a custom trained model that is doing the same. In the playground it’s perfect. Via an API call i get shitty weird responses

2 Likes

Same issue, using exact same params, but the playground has a different (and better) response than the API.

2 Likes

same issues. I am using ada and babbage for classification. Playground returns me the result but the sdk returns a blank string.

2 Likes

Same issue with text-davinci-003. My prompt has instruction to identify appointments in email threads for automated scheduling followed by the actual thread. Playground does great but the api returns incorrect answers, many of them out of format. Both have temperature set to 0.

3 Likes

same problem here, prompts and parameters the same - consistent results in playground, something else on api. Stops me from actually building. Any workarounds? Anyone tried paid api?

3 Likes

we experience the same with fine-tuned models. completions differ massively between playground and api while api being way worse.

3 Likes

We experience the same problem with gpt-4 model.
Maybe playground is adding some information to request that we don’t agregate in request.

3 Likes

I’m having the same issue, I’m using the prompt ChatGPT suggests. I’m trying to generate a conversation between 4 people. Using the playground the results are great. Using the API and the results are much worse if it even works on some requests.

3 Likes

Same here. I am using DaVinci3 for autoformalization (translating sentences into formal logic), my prompt works well in the playground, but via the API, it frequently returns completely wrong answers, thus making the intended application unusable.

2 Likes

Hmmm… Same here. Not from this account but from my company account. Using gpt-4 api (or gpt-3.5-turbo - doesn’t matter), Temperature is of course… 0.0 but we are getting not a consistent response with the exact same prompt. Does any one knows how to solve this issue? :face_with_spiral_eyes:

3 Likes

yeah, observing the same thing. I am trying out the gpt-3.5-turbo. The response on the playground work pretty well, but with api it’s way off.

The responses are very unstable for some reason on API. I agree the temperature is suppose to bring the randomness to answer but this is not randomness. Anyone with the explanation and solution?

3 Likes

Want to also add that this is happening to me using GPT-3.5-Turbo.

I am trying to format the response in a certain way. My prompt includes steps to follow for the output. One of the steps can also be ignored depending on the condition. The prompt also includes four example inputs and outputs. On playground the output is more or less fine. But when calling the API, when the condition to ignore one of the steps happens, it doesn’t ignore it.

3 Likes

There may or may not be a slight difference in the playgrounds and the API. Specifically the playground. ChatGPT is a entirely different beast just to clarify.

Key word is slight. Maybe. Not confirmed. No idea. If there is a difference, it would be slight and mostly unnoticeable.

If there is a large difference, most likely there is some formatting issues going on.
If there’s no request, we cannot help your problem.

The most common issue that I see is from incorrectly formatting multi-line strings.

>>> prompt = """do this
...          and do that"""
>>> prompt
'do this\n         and do that'
2 Likes

I am using the code from the code window in the playground, my inputs have no line breaks, and the results are very different (and far worse in the API).

2 Likes

I have the same issue. Even when I use azure open ai. I’m pushing the system message to respond in a specific JSON format. It works perfectly on GPT-3.5-Turbo on Azure and OpenAI playground but it just ignores the whole JSON instruction via API most of the time. With GPT-4 it seems to work better via API as well, but also not as perfect as it does on the playground. It seems there is something missing in the request via API that the Playground adds.

3 Likes

API is way worse. It keeps repeating itself even with frequency_penalty set to 2, meanwhile playground is exactly what you think it should respond.

3 Likes

Same for me. I tried to fine-tune the model for sentiment analysis and received the expected outcome in the playground. However, when using the API, the results seem to come from the original, un-fine-tuned model.

3 Likes

Is there still no reaction on why this happens, how and when it will be fixed or what one can do about it? I, for one, have now put my projects on hold, since the API performance – in contrast to the playground performance – is simply unusable for my purposes.

4 Likes

For everyone stuck with the same problem - try this: make sure that the message with role: “system” goes first in the list of messages in your request.

The system prompt seems to always go as a very first message when sent from playground. I was creating a message list with the user message first and then appending the system prompt later if needed.

Now the quality and consistency of results I’m getting from the API is comparable to what I see in playground.

I don’t know if this will work for other people but I’m using python to access the API and found that two things helped. The biggest was making sure I had the latest openai package installed pip install openai --upgrade and the second was switching from f-strings to raw strings when feeding in system/user prompts. Hopefully this helps other people too.