Different results from openai and playground

at-bay-ds-team · June 29, 2022, 3:59pm

Hi,

I’m sending in a completion request for a Q&A. I’m providing an email that we got and asking what the revenue of the company mentioned is. I am using ‘text-babbage-001’.
In the playground I get the correct answer, but when using the api it randomly adds 811,000 to the amount.
Also if the revenue isn’t mentioned it return 811,000 as the amount.
Any ideas why?
when I use curie, it seems to be ok (for a specific example - I didn’t run it on the whole set).

Thanks!

ericz · June 29, 2022, 6:14pm

@at-bay-ds-team I’m curious if you found a solution since I have the same problem. I literally copy the prompt string and paste it in the playground and get a (consistent) answer. My local code gets a (consistent) different answer.

I have gone so far and used the chrome developer tools to look at the settings that are used in the http post and made sure they match with no effect.

I understand that there is some inherent randomness and that answers aren’t always the same, but this seems like something else is going on.

chinmay1 · June 30, 2022, 2:36am

Babbage is not very good to get answers from embeddings. I would stick with DaVinci. A few other things:

What prompt are you giving to the bot? You need to say that "if a question can’t be answered based on the knowledge base, respond with “Unknown” or something like that.
What temperature are you using? Set it to zero.
Make sure you are using the eight API to create embedding for the knowedge base and question respectively. These are two different engines.

at-bay-ds-team · June 30, 2022, 6:43am

It doesn’t seem to me that there is an issue of randomness here. I am consistently getting the same answer in the playground, and consistently getting the same wrong answer through the api.

at-bay-ds-team · June 30, 2022, 6:47am

babbage is giving the right answer on the playground - and I am using the code provided there.
At this point I will have to use a different model, but the question is why is the difference occurring?

this is the code I am using:

response = openai.Completion.create(
            model=openai_model,
            prompt=f"{text}\nQ: what was the revenue for the last fiscal year?\n\nA:",
            temperature=0,
            max_tokens=100,
            top_p=1,
            frequency_penalty=0,
            presence_penalty=0,
            stop=["\n"]
        )

temperature is 0.
I am new to openai - how would you implement your suggestions (points 1 and 3)?

chinmay1 · June 30, 2022, 11:59am

Try doing this. Whatever question a user is asking, append that question with "If you can’t find the answer in the provided context, say “unknown”. Let me know how tat goes.

ericz · June 30, 2022, 6:59pm

I should have clarified what I was using! I am using the completion API from Python and these settings:

        model="text-davinci-002",
        prompt=prompt,
        temperature=0,
        max_tokens=256,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        best_of=1,
        stop=["\n"]

It always gives an answer that is right-ish, but just different so it isn’t a problem with not knowing.

I wonder if the playground gets “newer” or “test builds” before they roll out?

chinmay1 · July 1, 2022, 4:28am

Just try changing the prompt as I suggested and let me know.

at-bay-ds-team · July 3, 2022, 5:55am

I tried this - but then it just gave ‘unknown’ for everything

at-bay-ds-team · July 4, 2022, 5:41am

I tried this with a different prompt using davinci and it works.
This still doesn’t explain the discrepancy between the playground and the api…

jw0240195 · January 3, 2023, 3:14am

I have the exact same problem with text-davinici-003 now. I wanted to generate emails and the API consistently makes up, that I have studied Computer Science even though I said that I studied Math. The playground got the Math part consistently right. Exact same prompt and temperature is in both cases 0.

heiko · January 3, 2023, 8:44pm

I am just seeing this thread. I had the same thing. I copied a prompt back and forth from the playground to double-check and consistently got the intended answer in the playground and the “wrong” answer when I called the code provided by playground with literally the same prompt

jw0240195 · January 4, 2023, 11:19am

So odd. I don’t know why this happens. The biggest problem for me is that the API still makes stuff up with temperature 0. Somehow the temperature 0 for the API is not the same as temperature 0 for playground.

dennyroberts · January 14, 2023, 2:34pm

This is happening to me now too, trying to use text-davinci-002. The results are coming back a little sanitized, like either the temperature is being turned down, or it’s using text-davinci-003 instead maybe?

eklitsie · January 15, 2023, 12:11am

im struggling with this exact thing. Im trying to backwards engineer prompts from poems, the API and playground have the exact same prompt and settings. But when I use my API, no matter which poem I input the response is always: ‘Prompt: Write a poem about the beauty of nature and how it can bring peace and joy to our lives.’
But when I use playground it generates prompts as I expect :\

jared2 · January 17, 2023, 6:17pm

I hav3 a custom trained model that is doing the same. In the playground it’s perfect. Via an API call i get shitty weird responses

haris1 · January 18, 2023, 7:28pm

Same issue, using exact same params, but the playground has a different (and better) response than the API.

nitin · January 23, 2023, 9:22pm

same issues. I am using ada and babbage for classification. Playground returns me the result but the sdk returns a blank string.

simonb · February 3, 2023, 11:20pm

Same issue with text-davinci-003. My prompt has instruction to identify appointments in email threads for automated scheduling followed by the actual thread. Playground does great but the api returns incorrect answers, many of them out of format. Both have temperature set to 0.

silverk · February 15, 2023, 12:50pm

same problem here, prompts and parameters the same - consistent results in playground, something else on api. Stops me from actually building. Any workarounds? Anyone tried paid api?

Topic		Replies	Views
Major difference in output: playground vs API API api , playground , prompt	11	4583	August 12, 2024
Playground and API returing different results? API	7	1893	December 6, 2023
Getting different result when using playground vs API with gpt3.5-turbo API api	5	728	December 21, 2023
Playground <> API Response (all params equal) API chatgpt , api	11	1129	April 15, 2024
Too much difference in Playground response vs API response API gpt-4 , playground	3	2663	October 17, 2024

Different results from openai and playground

Related topics