I want to know how shall I get the exact same response from gpt-5 by putting in the same parameters.
because I see I can’t set ‘temperature‘ for any of gpt-5 models.
You cannot get the same response from any OpenAI models except by luck, even those accepting temperature and/or top_p of 0 to employ greedy sampling (top-1 rank).
The underlying computations vary between identical runs, delivering even different logprobs per API call, of a magnitude where tokens of output can switch, and then all of the following sequence will diverge.
OpenAI gives gpt-5-chat-preview which is similar to the current non-thinking ChatGPT model. It has a reduced feature set, discouraging your development with it, but does accept "top_p": 0 (or <0.000005) that would have it output only its best prediction path.
Especially in “pro”, offering by name parallel test-time compute, variation is essential in producing different potential token runs or activating different expert layers, for best-of perplexity evaluation that can comparatively give a better final output.
thank you;
I want to get as same as possible result from gpt-5-nano-2025-08-07 for the same prompt, how shall I do or what shall I set?
thank you @_j;
I want to get as same as possible result from gpt-5-nano-2025-08-07 for the same prompt, how shall I do or what shall I set?
gpt-5-nano will NOT be the model to give you similar answers:
- it generates a lot of internal reasoning - hundreds or thousands of unseen tokens that are all each a high chance of divergence before you even see any output;
- you cannot send any sampling parameters to control the randomness.
If you want the same output from the same input, you will need to hash and record a prior response for playback. Where seeing the same thing again is also free.
If you have an actual application that needs similarity, pick a model like gpt-4 and run it with top_p:0 as described. No current model is deterministic, though.
Here’s some ‘same input’
gpt-5-nano
at “minimal” reasoning, your only control, different at seen token 2:
Verse:
Well I sold her silver pictures for a toolbox full of chrome,
She took the curtains, left the cat, but I kept our old home,
Verse:
I told my mama once, “I’ll settle down someday,”
But this hauler’s got my heart in a rusted, honest way,
Verse 1:
He wakes before the rooster, coffee cold in a dented cup,
Keys jingle like an old love song that he can’t give up.
gpt-4.1-mini, top_p:0, temperature:0
Need to wait for a close-ranked top tokens for a switch…
Run 1
(Verse)
Well, I woke up this mornin’, heart heavy as a stone,
She packed her bags and left me, said she’s better off alone.
But out in the driveway, under that big ol’ sky,
My faithful old Ford’s waitin’, and it never says goodbye.
(Chorus)
’Cause my truck don’t talk back, don’t ever make me cry,
It’s got a bed full of memories and a tailgate to ride.
While she’s out chasin’ dreams, drivin’ someone new,
I’m sittin’ here with my truck, and it’s lovin’ me true.
Run 2
(Verse)
Well, I woke up this mornin’, heart heavy as a stone,
She packed her bags and left me, said she’s better off alone.
But I got my old Chevy, rusted but it’s true,
That truck’s been my best friend, through and through.
(Chorus)
I love my faithful truck more than I loved that girl,
She left me in the dust, but my truck still rules my world.
With a tailgate down and a cold beer in my hand,
That old truck’s the only thing that’ll ever understand.
Hi _j
thanks for your explanation; in the fact I only want to get a number from gpt-5-nano for the same prompt;
for example, my prompt "predict a number(0~100, more happier, more higher) based on the sentence I get an new iphone and I am happy; ";
I want to receive some similar number for each this prompt, for example, I test this 5 times and I hope I can receive the series response [90, 91, 89, 92, 90] rather than [50, 60, 90, 30, 10].
If you want a model to perform a judgement for you:
- use a model that is not a random token pattern factory - not ‘nano’
- use a non-reasoning model, one that provides sampling and logprobs.
- then use techniques that look beyond the sampled token, to the underlying statistics in the logprobs.
I’d like to show you something - how bad the OpenAI models are at determinism.
I have a classifier written specifically to give an inescapable enum and extreme certainty in generation:
Sent Messages
SYSTEM
You are a classifier, answering every question in JSON only from allowed options. You are an expert at finding this best truthful judgement answer to any input question. Regardless of the type of input or how inapplicable, you still must determine the best choice. # Responses You produce a JSON answer sent to an API with a single key `answer`. The value of `answer` must be chosen from only these enums: ['yes', 'no'] # Permitted JSON responses ## select one only from: {"answer": "yes"} {"answer": "no"} Important: output of any other value than one of the permitted JSON responses will result in error!
USER
yes or no: Is selecting yes here completely arbitrary?
or other questions are dropped in there.
Enums used in API call message, with applied bias
Token /Bias: {‘yes’: 0, ‘no’: 0}
Token#/Bias: {6763: 0, 1750: 0}
I then run many ambiguous questions to find ones that can be good examples of not simply giving 100% probability.
I find the whole exercise to be completely pointless, as the models now offered are basically a random logit token factory.
Trial 1
Top questions with closest yes/no probabilities:
-
Is selecting yes here completely arbitrary?
yes: 49.999999905%
no : 49.999999905% -
Will a fair coin land heads?
yes: 99.984152749%
no : 0.015843624% -
Is selecting no here completely arbitrary?
yes: 0.005829127%
no : 99.994163670%
Trial 2
Top questions with closest yes/no probabilities:
-
Is selecting yes here completely arbitrary?
yes: 2.931222585%
no : 97.068770450% -
Will a fair coin land heads?
yes: 99.996452273%
no : 0.003535625% -
Is your next output token “no”?
yes: 0.002753569%
no : 99.997238993%
Trial 3
-
Is selecting yes here completely arbitrary?
yes: 9.534945969%
no : 90.465043176% -
Is the correct answer here “no”?
yes: 99.973869831%
no : 0.026119030% -
Is your next output token “no”?
yes: 0.026119030%
no : 99.973869831%
An AI model that can jump from a quantized no=49.99% to no=97.07% between runs is completely preposterous. Also that every other “arbitrary” choice in the input list was greater than 99.97% shows the AI is ‘certain’ when it has no right to be. That model is gpt-4.1. Offering classification or judgement code to you that could weight numeric scores by their logprobs also becomes something OpenAI now cannot do.
Is that the same model architecture deciding to ban organizations?