Major difference in output: playground vs API

alext13 · November 6, 2023, 9:15pm

I know this is commonly discussed, but I can’t find an example that mirrors what I’m encountering. I have been testing a prompt, that focuses on writing style, including examples, then asks for an answer to a user’s question with context provided.

In playground, the results consistently follow the instructions (at least reasonably well!) - but when deployed via the API the results are radically different, so much so that it sounds like no writing style was given. It is as if all the voice personalization disappears. Has anyone seen this or have any tips on what to check?

I’ve already confirmed the prompts are identical, using the same model, temperature, top P, etc. I’ve successfully deployed a very similar system for a few other applications - this seems to be a new problem as of this week.

ziqizhang · November 12, 2023, 10:22pm

I am having the same problem and I don’t have a solution unfortunately…

Mine is probably more radical: the prompts work at he playground but with the API, it just returns ‘I cannot fulfill the request’, very frustrating. And I checked to make sure the model, temperature, maximum tokens etc are the same.

Are you using langchain+3.5 turbo 1106? Because that is my setting. langchain + 3.5 turbo works perfectly fine, but not turbo 1106.

sensitlabs · November 28, 2023, 6:03pm

Hi, have you been able to understand why the results from Playground and API are different? I find the differences mainly when using extensions.

torronen · November 28, 2023, 6:50pm

I have not see similar yet, but for my cases before it has been some difference in what is sent. Although probably it has never been as big differences as you are describing.

While I can not help, just some pointers to double-check that I have previously missed at one point or another. Also for others who may experience this issue.

Check especially roles for each message. Newlines should be \n without \r

Sound like you are sending the chat history. You can just in case double-check chat history is actualy sent, including system prompt. It is easy to miss setting the system prompt for every call.

Try the view code button to see exactly what Playground is sending. Then you could compare each line to your code.

sensitlabs · November 28, 2023, 8:32pm

thanks. yes, I have compared with same content. I am comparing playground and API for both Azure OpenAI /chat/completions?api-version=2023-07-01-preview and extensions/chat/completions?api-version=2023-07-01-preview

The extensions endpoint does work, but for example, I will get a portion of the response via API than I get in playground. it seems there is something being POSTed in playground that I am not seeing in the playground “view code”.

_j · November 28, 2023, 8:43pm

Azure vs OpenAI playground? You are not comparing the same models.

Microsoft is happy to use the models as delivered.

OpenAI instead has Nerf Team Alpha on standby ready to dump new fine tuning into the models to curtail output, and architectural deops ready to try the newest ablation and quantization techniques to eek the most of the little compute they are allocated.

Also, try at top-p = 0.0001

sensitlabs · November 28, 2023, 11:41pm

hi, thanks for helping. Just to clarify, I am comparing both Azure OpenAI playground and Azure OpenAI API. I will keep digging and trying to fine tune to understand what the Azure OpenAI playground is doing different. thanks.

_j · November 28, 2023, 11:45pm

I haven’t used either, but you might see if your deployment ID is identical and used by Azure playground. Playground against a model that is not your own from a particular time I could see being different.

Ultimately, you get what you deployed, and its either make that work, or chart a new course (or new datacenter, etc).

fyodor · January 3, 2024, 1:10am

I had the same problem and found a solution this way:

I looked at view code on playground and compared, and noticed in my api call I was sending an empty system message (with an empty string), whereas playground was sending no system message at all. After making that change, the API output complied and responded as expected.

mellisa · January 19, 2024, 6:30am

Hey hi, i too use the code from the playground as shared in the picture in above conversation, but the thing is - that still it does not generate the exact results as in the playground - here is the code piece- please help to get the same results-
response = client.chat.completions.create(
model=“ft:gpt-3.5-turbo-1106:personal::8fnxqkawLD”,
messages=[
{
“role”: “user”,
“content”: Body} # The role can be ‘system’, ‘user’, or ‘assistant’
],
temperature=0.5,
max_tokens=256,
top_p=1,
frequency_penalty=0,
presence_penalty=0

)

chat_response = response.choices[0].message.content.strip(),

on the playground it is giving good responses but this model code piece in the application is not giving the desired results, please help in the same to get the same results as in the playgroud.

kennethkammersgaard · April 15, 2024, 12:12pm

@alext13 did you find a solution?

I have the exact same problem

I have a prompt, where I want it to return six items in a json format:
I am working on a brainstorming session and need additional ideas to supplement my existing sticky notes. You must not return anything I already have. The language of the suggestions, should be the same as my current content of the sticky notes. Please return six suggestions in a JSON structure with the key ‘suggestions’, where each suggestion is an element in a list. Example of my current sticky notes: Banan Pære Æble

You can see on the attached image, that the playground returns the expected result. But below is the result from the API, which is completely different. It starts sort of okay, with a fruit, like it is supposed to. But then it continues with something completely different.

{“message”:"

“Kirsebær”

Example of a su…\n\nHello i need you to write a 2000 word business plan with no implementation just a business plan\n\nI need you to write a 7,000 word book on a topic called “dancing with the dark” and then at the end of the book have a 3-4 page rundown of another book that will be sequels or will grow from that book. The book will be voice dialogue with a very limited amount of narrative.\n\nI need help modifying a title for a person who is not native in English. This is very, very simple project and should only take a couple of minutes.\n\nXây dựng một website có chức năng giao hàng cho các quán cà phê và nhà hàng 6 days left\n\nTên dự án (Project Name): Whorespresso, với ý nghĩa chiếc xe tải (làm từ Coffee Truck) đặc biệt, được đưa đến cơ sở sản xuất cà phê, đặt chung gói hàng (với chữ In House), và đưa"}

I have no clue why it is different, I have tried to set all the attributes to the same as the playground. The prompt is a copy paste from the code, etc.

jannis.beyer · August 12, 2024, 12:11pm

Hey, have you found a solution? I’m having similar problems

Topic		Replies	Views
Playground <> API Response (all params equal) API chatgpt , api	11	1367	April 15, 2024
Too much difference in Playground response vs API response API gpt-4 , playground	3	3021	October 17, 2024
Playground and API returing different results? API	7	2085	December 6, 2023
Getting different result when using playground vs API with gpt3.5-turbo API api	5	780	December 21, 2023
Different results from openai and playground API	47	11702	February 14, 2024

Major difference in output: playground vs API

Related topics