Why do OpenAI’s Assistant and Chat Completion models produce different outputs for identical prompts when analyzing visual content?
It depends.
Temperature
is a parameter that you can use to introduce variability to the output. You can set this to 0 and have a “greedy grab”: Grab the most likely next token. Fair warning: a temperature of 0 usually invokes something or negates the temperature altogether. So this rule of “greedy grabbing” from temperature: 0
isn’t universally true.
However, even with a temperature of 0 you can still have some variability. I don’t think the reason behind this was ever confirmed, but the top 2 contenders I believe are:
-
Floating Point Rounding: An issue with any system that needs to perform specific work on lengthy floating point numbers causes very very small, but typically negligible differences in the output
-
Parallelization: Outputting the tokens as fast as possible requires parallelization which means that different systems with different clocks also end up with very small, mostly unnoticeable decimal issues.
If you are noticing a big difference even with a temperature of 0 you could also run N completions and then use a final model to synthesize the data.