Is gpt-4o consistency affected over time?

mihai.niculai · November 17, 2024, 9:29am

Hello,

I’ve encountered a recurring issue with GPT in Azure OpenAI. The behavior of the model seems to change from week to week, even though I’m using the same model (gpt-4o-2024-08-06) under identical conditions.

Here’s the context:

Consistent Prompts: The prompts I use haven’t changed.
Stable Environment: Python package versions remain the same.
Model Temperature: Set to 0 for deterministic outputs.
Tested Prompts: I’ve tested these prompts with multiple shot configurations in the past, and they provided consistent results.

However, week by week, the model responds differently to the same inputs. This behavior impacts my use case for routing or generating responses, forcing me to tweak prompts regularly because certain cases no longer produce the expected results.

Is it normal for the model to exhibit such variability? Are there any updates or changes being made to the model behind the scenes that might affect this?

I would appreciate any insights or clarification on this matter.

Thank you!

Zaharie_Andrei-Codru · June 23, 2025, 7:22pm

Hello we are facing the same issue. Have you ever found out what might be the problem?

OnceAndTwice · June 24, 2025, 4:32pm

Adjusting temperature or top_p to 0 doesn’t necessarily guarantee persistent behavior or consistent outputs. And any LLM will have some rate of inaccuracy. It’s unlikely the model’s behavior has actually been altered. You may consider adjusting your prompt or using a fine-tuned model to improve accuracy.

Topic		Replies	Views
Inconsistencies in GPT-3.5-turbo Model Behavior During Load Testing API gpt-35-turbo , api	0	924	November 11, 2023
GPT-4 API result not stable with temp 0 API gpt-4	1	1810	July 17, 2023
Inconsistent and Inaccurate Responses from GPT-4o Bugs api , gpt-4o	1	955	September 17, 2024
Run same query many times - different results API	11	8401	December 21, 2023
Inconsistencies in API response to same prompt and similar content API gpt-4 , gpt-35-turbo , api	3	5190	July 18, 2023

Is gpt-4o consistency affected over time?

Related topics