temperature , top_p and n are fixed at 1 , while presence_penalty and frequency_penalty are fixed at 0 .
Why is temperature and top_p set to 1? This is a reasoning model, not a creative model. Wouldn’t setting temperature and top_p high increase the likelihood of hallucinations and selecting tokens that produce less likely true outcomes?
For me, that’s not just a theoretical prediction of what those values change in the output. In my experience across gpt-3, 3.5, 4, 4o, turbos, claude, phi, mistral, llamas, in eval environments, they all produce the best code (in terms of quality and in terms of sticking to the instructions) when temperature is 0 and top_p is very close to 0. I don’t mind non-creative and repetitive responses.
Please help me understand that choice for default temp and top_p.
Possible, although there’s been temperature-looking problems reported when using other languages and getting oddball 3rd language tokens, so the creativity might be used in the wrong place.
Then if you want variations to evaluate a best response from, temperature is also important, like the best_of API parameter for completions (that uses logit probability totals instead of AI to judge). best_of: 10 is wasting your money with no sampling variety.