Why do OpenAI Assistant and Chat Completion models give different responses to the same prompt?

contact149 · October 28, 2024, 3:36pm

Why do OpenAI’s Assistant and Chat Completion models produce different outputs for identical prompts when analyzing visual content?

anon10827405 · October 28, 2024, 3:46pm

It depends.

Temperature is a parameter that you can use to introduce variability to the output. You can set this to 0 and have a “greedy grab”: Grab the most likely next token. Fair warning: a temperature of 0 usually invokes something or negates the temperature altogether. So this rule of “greedy grabbing” from temperature: 0 isn’t universally true.

However, even with a temperature of 0 you can still have some variability. I don’t think the reason behind this was ever confirmed, but the top 2 contenders I believe are:

Floating Point Rounding: An issue with any system that needs to perform specific work on lengthy floating point numbers causes very very small, but typically negligible differences in the output
Parallelization: Outputting the tokens as fast as possible requires parallelization which means that different systems with different clocks also end up with very small, mostly unnoticeable decimal issues.

If you are noticing a big difference even with a temperature of 0 you could also run N completions and then use a final model to synthesize the data.

Topic		Replies	Views
Observing discrepancy in completions with temperature = 0 API	9	17475	February 6, 2024
Why is GPT-4 giving different answers with same prompt & temperature=0? API	6	16463	April 6, 2023
Why does the OpenAI gives different non relating random response to the same question every time? API	11	2480	December 17, 2023
I get different answers to the same request API gpt-4 , gpt-35-turbo , chatgpt , api	2	5251	December 8, 2023
Is the completion API endpoint stateful in any way? API completions	4	1406	May 17, 2023

Why do OpenAI Assistant and Chat Completion models give different responses to the same prompt?

Related topics