Major difference in output: playground vs API

I know this is commonly discussed, but I can’t find an example that mirrors what I’m encountering. I have been testing a prompt, that focuses on writing style, including examples, then asks for an answer to a user’s question with context provided.

In playground, the results consistently follow the instructions (at least reasonably well!) - but when deployed via the API the results are radically different, so much so that it sounds like no writing style was given. It is as if all the voice personalization disappears. Has anyone seen this or have any tips on what to check?

I’ve already confirmed the prompts are identical, using the same model, temperature, top P, etc. I’ve successfully deployed a very similar system for a few other applications - this seems to be a new problem as of this week.

1 Like

I am having the same problem and I don’t have a solution unfortunately…

Mine is probably more radical: the prompts work at he playground but with the API, it just returns ‘I cannot fulfill the request’, very frustrating. And I checked to make sure the model, temperature, maximum tokens etc are the same.

Are you using langchain+3.5 turbo 1106? Because that is my setting. langchain + 3.5 turbo works perfectly fine, but not turbo 1106.

Hi, have you been able to understand why the results from Playground and API are different? I find the differences mainly when using extensions.

I have not see similar yet, but for my cases before it has been some difference in what is sent. Although probably it has never been as big differences as you are describing.

While I can not help, just some pointers to double-check that I have previously missed at one point or another. Also for others who may experience this issue.

Check especially roles for each message. Newlines should be \n without \r

Sound like you are sending the chat history. You can just in case double-check chat history is actualy sent, including system prompt. It is easy to miss setting the system prompt for every call.

Try the view code button to see exactly what Playground is sending. Then you could compare each line to your code.

1 Like

thanks. yes, I have compared with same content. I am comparing playground and API for both Azure OpenAI /chat/completions?api-version=2023-07-01-preview and extensions/chat/completions?api-version=2023-07-01-preview

The extensions endpoint does work, but for example, I will get a portion of the response via API than I get in playground. it seems there is something being POSTed in playground that I am not seeing in the playground “view code”.

Azure vs OpenAI playground? You are not comparing the same models.

Microsoft is happy to use the models as delivered.

OpenAI instead has Nerf Team Alpha on standby ready to dump new fine tuning into the models to curtail output, and architectural deops ready to try the newest ablation and quantization techniques to eek the most of the little compute they are allocated.

Also, try at top-p = 0.0001

hi, thanks for helping. Just to clarify, I am comparing both Azure OpenAI playground and Azure OpenAI API. I will keep digging and trying to fine tune to understand what the Azure OpenAI playground is doing different. thanks.

I haven’t used either, but you might see if your deployment ID is identical and used by Azure playground. Playground against a model that is not your own from a particular time I could see being different.

Ultimately, you get what you deployed, and its either make that work, or chart a new course (or new datacenter, etc).

1 Like

I had the same problem and found a solution this way:

I looked at view code on playground and compared, and noticed in my api call I was sending an empty system message (with an empty string), whereas playground was sending no system message at all. After making that change, the API output complied and responded as expected.

Hey hi, i too use the code from the playground as shared in the picture in above conversation, but the thing is - that still it does not generate the exact results as in the playground - here is the code piece- please help to get the same results-
response = client.chat.completions.create(
model=“ft:gpt-3.5-turbo-1106:personal::8fnxqkawLD”,
messages=[
{
“role”: “user”,
“content”: Body} # The role can be ‘system’, ‘user’, or ‘assistant’
],
temperature=0.5,
max_tokens=256,
top_p=1,
frequency_penalty=0,
presence_penalty=0

)

chat_response = response.choices[0].message.content.strip(),

on the playground it is giving good responses but this model code piece in the application is not giving the desired results, please help in the same to get the same results as in the playgroud.