Is the lower the temperature, the more correct the answer is?

If I set the temperature to 0, can I significantly reduce the illusion of chatgpt? in other words, is the lower the temperature, the more correct the answer is?

It will be more accurate and be less creative, but it will still depend on the training data.

If you set the temperature to 0, the model’s responses will likely be more accurate with respect to factual information, as it will choose the most probable word at each step based on the training data. However, this could also make the responses sound robotic and less human-like, lacking natural language.

It’s best to find what kind of temperature suits you best.

Hope this answers your question.

1 Like

Hi, according to the API reference, the temperature is about the randomness of the output - if you set the temperature to 0 and run the model with a certain prompt, you are quite sure that you can always get the same completion with the same model, the same prompt and temperature = 0.

This being said, if you get a wrong answer from the model with temperature = 0, you are very likely to always get the same wrong answer with the same prompt and temperature = 0.

I would see the question from this perspective - temperature is about how deterministic you want the completion to be given a prompt, it’s not necessarily linked to the correctness of the content of the completion.

If you want to have a higher chance of getting correct answers, especially if you want the answer to come from some specific sources, you can perhaps use embeddings that allow you to leverage explicitly external knowledge.


setting temp to zero results in ‘greedy’ token selection - ie the one with the highest logit value. That doesn’t necessarily mean ‘most accurate’. It simply means ‘highest probability next token.’ the problem is, given what training corpus? Imagine you trained only on Alice in Wonderland. does ‘highest probability’ align with ‘most accurate in the real world’ in that case?
GPT was trained on a huge corpus. Not all of it is ‘accurate’ in the sense of ‘commonly accepted factual knowledge as of today’. presumably some of the corpus includes english translations of ancient greek medical texts… see the problem?


btw, one way to ameliorate this is to add ‘according to scientific consensus’ or ‘according to accepted fact’ or something like that to your query. But notice I said ‘ameliorate’, not eliminate. :slight_smile:



I dont think temp 0 results in greedy token selection. I used to think that, but after observing very clear non determinism I no longer think that. I do not actually know what temp 0 means, because mathematically you arent going to divide by 0. OpenAI has never said that temp 0 means greedy. I feel like they must be just making temperature some small value or something like that. Additionally, GPT-4 mixture of experts model adds randomness to the token prob calculations, and so there really is no such thing as deterministic GPT-4 outputs at least for us consumers of the API. Even if you had greedy sampling, the probabilities can change slightly and so the same tokens wont necessarily be picked.