Inconsistent and Inaccurate Responses from GPT-4o

Hello everyone,

I’m currently experiencing an issue with GPT-4o where the responses I receive are inconsistent and often inaccurate. This is surprising to me as I have the temperature set to 0 and the seed set to 9999, both of which should theoretically reduce variability in the responses.

For context, I’m providing the same instructions that yield correct and consistent results when using GPT-4-Turbo. My goal is to minimize costs, which is why I’m trying to utilize GPT-4o. However, the discrepancies in the responses from GPT-4o are causing significant challenges.

Here are the specifics:

Model: GPT-4o
Temperature: 0
Seed: 9999

Has anyone else encountered similar issues with GPT-4o? Are there any recommended solutions or workarounds to achieve consistent and accurate results similar to what I get with GPT-4-Turbo?

Any insights or advice would be greatly appreciated. Thank you!

1 Like

Hi, I am experiencing similar issue with both gpt-4-turbo and gpt-4o-mini returning inconsistent results across different runs with same seed, temperature, prompt, etc. Returned system fingerprint is also the same. Happy to share more info if someone is interested. This is happening very frequently. It would be great if someone from OpenAI can just confirm that this is expected, although the documentation hints otherwise - https://platform.openai.com/docs/advanced-usage/reproducible-outputs