I’ve noticed that running the same prompt multiple times sometimes produces noticeably different responses.
Is this expected behavior due to model randomness, or could it be related to temperature or other parameters?
Any clarification on how to achieve more deterministic outputs would be helpful.
Yes, that’s totally normal.
These models aren’t deterministic by default, so even the same prompt and the same parameters can branch into different outputs. A tiny change in the early tokens is enough to shift the whole answer.
If you want it to behave more consistently, try:
• temperature = 0
• top_p = 1 (or lower if the prompt is sensitive)
But even then, “identical every time” isn’t guaranteed, it just reduces the randomness.
1 Like
you cant replay the same prompt deterministically in an LLM.