Stocastic nature of GPT-4 Turbo

Hi Community,

You may have witnessed this already in your trials so I’m hoping to gain some insight. I’m using GPT-4 Turbo on an Azure PTU for a RAG system and in a recent trial I noticed that when I ask the same question with same retrieved context on the same endpoint, I get different responses even if the temperature is set to 0.

Any idea on why this may be happening and how to make the responses more consistent?

make sure you also set these params to be sure

    top_p: 1,
    frequency_penalty: 0,
    presence_penalty: 0,

Other than that I’d suggest giving it more granular data or a more granular task, you may be giving it data that is too broad for it to give a consistent outcome, or asking a question that is too broad for it to give a consistent outcome. I have managed to get it to give pretty consistent outcomes about 90-98% of the time. If you’re looking for it to respond the same way 100% of the time you should feed it data and a task where it has the ability to narrow down its answers to a small subset. e.g. truthy values, numeric values.

If you want it to summarize something the exact same way every time I haven’t figured out how to do that.

1 Like

Hello, thanks for a quick response. I’ll double check the code but those params should be set to the values that you have highlighted.

I understand the stochastic nature of LLMs so there’ll be some variability. However, what’s interesting is that earlier when we were testing GPT-4 Turbo on multiple endpoints on PayGo rather than a single endpoint on PTU, the responses with same implementation (parameters, system prompt, retrieved context, etc.,) were more consistent.

In some cases, just adding a period at the end of the user question seemed to have made responses on single PTU endpoint more consistent.