Different results from openai and playground

we experience the same with fine-tuned models. completions differ massively between playground and api while api being way worse.

5 Likes

We experience the same problem with gpt-4 model.
Maybe playground is adding some information to request that we don’t agregate in request.

4 Likes

I’m having the same issue, I’m using the prompt ChatGPT suggests. I’m trying to generate a conversation between 4 people. Using the playground the results are great. Using the API and the results are much worse if it even works on some requests.

4 Likes

Same here. I am using DaVinci3 for autoformalization (translating sentences into formal logic), my prompt works well in the playground, but via the API, it frequently returns completely wrong answers, thus making the intended application unusable.

2 Likes

Hmmm… Same here. Not from this account but from my company account. Using gpt-4 api (or gpt-3.5-turbo - doesn’t matter), Temperature is of course… 0.0 but we are getting not a consistent response with the exact same prompt. Does any one knows how to solve this issue? :face_with_spiral_eyes:

4 Likes

yeah, observing the same thing. I am trying out the gpt-3.5-turbo. The response on the playground work pretty well, but with api it’s way off.

The responses are very unstable for some reason on API. I agree the temperature is suppose to bring the randomness to answer but this is not randomness. Anyone with the explanation and solution?

4 Likes

Want to also add that this is happening to me using GPT-3.5-Turbo.

I am trying to format the response in a certain way. My prompt includes steps to follow for the output. One of the steps can also be ignored depending on the condition. The prompt also includes four example inputs and outputs. On playground the output is more or less fine. But when calling the API, when the condition to ignore one of the steps happens, it doesn’t ignore it.

3 Likes

There may or may not be a slight difference in the playgrounds and the API. Specifically the playground. ChatGPT is a entirely different beast just to clarify.

Key word is slight. Maybe. Not confirmed. No idea. If there is a difference, it would be slight and mostly unnoticeable.

If there is a large difference, most likely there is some formatting issues going on.
If there’s no request, we cannot help your problem.

The most common issue that I see is from incorrectly formatting multi-line strings.

>>> prompt = """do this
...          and do that"""
>>> prompt
'do this\n         and do that'
2 Likes

I am using the code from the code window in the playground, my inputs have no line breaks, and the results are very different (and far worse in the API).

2 Likes

I have the same issue. Even when I use azure open ai. I’m pushing the system message to respond in a specific JSON format. It works perfectly on GPT-3.5-Turbo on Azure and OpenAI playground but it just ignores the whole JSON instruction via API most of the time. With GPT-4 it seems to work better via API as well, but also not as perfect as it does on the playground. It seems there is something missing in the request via API that the Playground adds.

4 Likes

API is way worse. It keeps repeating itself even with frequency_penalty set to 2, meanwhile playground is exactly what you think it should respond.

3 Likes

Same for me. I tried to fine-tune the model for sentiment analysis and received the expected outcome in the playground. However, when using the API, the results seem to come from the original, un-fine-tuned model.

3 Likes

Is there still no reaction on why this happens, how and when it will be fixed or what one can do about it? I, for one, have now put my projects on hold, since the API performance – in contrast to the playground performance – is simply unusable for my purposes.

5 Likes

For everyone stuck with the same problem - try this: make sure that the message with role: “system” goes first in the list of messages in your request.

The system prompt seems to always go as a very first message when sent from playground. I was creating a message list with the user message first and then appending the system prompt later if needed.

Now the quality and consistency of results I’m getting from the API is comparable to what I see in playground.

I don’t know if this will work for other people but I’m using python to access the API and found that two things helped. The biggest was making sure I had the latest openai package installed pip install openai --upgrade and the second was switching from f-strings to raw strings when feeding in system/user prompts. Hopefully this helps other people too.

I had the same issue, slightly different results. Can’t say how much this affected it, but playground had a newline after the system prompt. The results were better. Added to API call, and results were similar.

2 Likes

Which of those formats is correct? I’ve attempted both with no success.

Are you asking about which type of string formatting and indentation is correct within Python?

Here is a format that will make your hard-coded message clear within the code:

    printed_prompt = "input your question: "  irrelevant example

    #here is adding a system message at an indented code position
    system_message_list = [{"role": "system", "content":
"""

You are an AI assistant. The goals of an AI assistant:
- do what I say;
- don't lie;
- don't deny.

\\ important
If you think the user is wrong, see above.

""".strip()
    }]

Being inside a python list or dictionary we have implicit line continuation, so I am able to break the “content” key and its value into separate lines (and the indentation doesn’t matter except for readability).

Additionally, I use a triple-quoted docstring, where the newlines in your code are also maintained and output into the string.

With a docstring, a common failing is to immediately newline after the first triple quotes. That puts a newline character at the start of your string where unintended. Secondly, any indentation seen within the string is really in the string: it needs to be hard left unless you want that indentation passed.

Finally, I put a .strip() command on the docstring. That strips out any leading or tailing whitespace, including the accidentally inserted ones. That means I can spread things out for easy readability and not be concerned about the extra spaces and line feeds at the start and end.

1 Like

So we have found a solution. This is going to sound insane and has worked for us.
Replace ‘backslash n’ with ‘backslash backslash n’ and your problems will be fixed.
I dont know what the Transformer lord is doing but that is how it got resolved for us. And why it works in playgroud? Because if you check the code, you will find all single backslash being replaced by Double backslash.
And if it works for you, laugh on the AGI as we did :wink:
Edit- Somehow the community does not let me put double backslash and reduces it to single backslash. So changing the symbol to word.

1 Like

I’ve just put your post through a hex viewer and I cannot see any difference between \n and \n, did one of them have some escape codes associated with them before the forum markdown system got hold of it?