Temperature parameter affects GPT-4.1 generation time

makedoniaz · August 6, 2025, 12:39pm

I’m using GPT-4.1 model to generate response based on retrieved data (40-50 rows in csv format). The data is given as a context (context size is ~5000 input tokens) to the model. However, when I set lower temperature, generation takes too much time (sometimes up to 10 mins).

In my query I ask model to give me retrieved data. When generation takes too much time I get only columns of my csv data, like here:

Here is all requested data from [table_name]:

| column_name1 | column_name2 | column_name3 | column_name4

And when generation is quick I get complete data with records and not only column names.

What can be cause of such behavior?

_j · August 6, 2025, 4:24pm

Why longer generation times with sampling parameters sent?

The product of AI inference is a map of the model’s token numbers to their certainty prediction.

These are then used by a ranked multinomial sampler to deliver a random sampling favoring likelihood.

Temperature modifies these logarithmic probabilities by dividing them by the input parameter, causing a redistribution in values. An extra operation vs using the native model output. With a 200k token dictionary and a model generating 100TPS, one can anticipate then 20 million additional CPU divide operations per second needed.

Temperature still should be an effect that is only in the “detectable when benchmarking”, not really noticeable. You might be misattributing causation.

You instead might use the top_p parameter for your desire to constrain to quality, which, instead of redistributing values, truncates by probability mass cutoff. It essentially eliminates poor tokens.

If you have an output that seems incomplete, that can just be the AI model predicting and having selected its “stop” sequence. In this case, instead of a vertical pipe and linefeed, you might receive a finish_reason of stop. Behavior is something that you can affect by prompt and model choice. The sampling parameters can increase or reduce the variance between responses using identical input.

Topic		Replies	Views
API calls slower for higher temperature? API api-temperature	7	1683	July 31, 2023
Ask about GPT4 temperature? API gpt-4	1	3465	September 4, 2023
GPT-4 API result not stable with temp 0 API gpt-4	1	1817	July 17, 2023
OpenAI Temperature parameter API gpt-4 , chatgpt , api-temperature	3	10791	December 21, 2023
Why is GPT-4 giving different answers with same prompt & temperature=0? API	6	17085	April 6, 2023

Temperature parameter affects GPT-4.1 generation time

Related topics