Is gpt-4o consistency affected over time?

Hello,

I’ve encountered a recurring issue with GPT in Azure OpenAI. The behavior of the model seems to change from week to week, even though I’m using the same model (gpt-4o-2024-08-06) under identical conditions.

Here’s the context:

  1. Consistent Prompts: The prompts I use haven’t changed.
  2. Stable Environment: Python package versions remain the same.
  3. Model Temperature: Set to 0 for deterministic outputs.
  4. Tested Prompts: I’ve tested these prompts with multiple shot configurations in the past, and they provided consistent results.

However, week by week, the model responds differently to the same inputs. This behavior impacts my use case for routing or generating responses, forcing me to tweak prompts regularly because certain cases no longer produce the expected results.

Is it normal for the model to exhibit such variability? Are there any updates or changes being made to the model behind the scenes that might affect this?

I would appreciate any insights or clarification on this matter.

Thank you!

1 Like

Hello we are facing the same issue. Have you ever found out what might be the problem?

Adjusting temperature or top_p to 0 doesn’t necessarily guarantee persistent behavior or consistent outputs. And any LLM will have some rate of inaccuracy. It’s unlikely the model’s behavior has actually been altered. You may consider adjusting your prompt or using a fine-tuned model to improve accuracy.