API calls slower for higher temperature?

dr · July 28, 2023, 1:34pm

Is it just me or are API calls with higher temperature slower for you as well, at least for chat completions with gpt-3.5-turbo and gpt-4?

udm17 · July 28, 2023, 1:36pm

Latency has no relation with the temperature used in the call. Must just be a co-incidence or GPT may be being slower in general

_j · July 28, 2023, 2:19pm

Pretty easy to find out. Run a loop with three temperatures, 0.8, 0.1, 1.8 and max_tokens 100.

12 temperatures in sets of 3, with the last two alternating:

temp 0.8, resp_tokens: 100. Tokens/s:
26.125656044975727
temp 0.1, resp_tokens: 100. Tokens/s:
27.519449703386897
temp 1.8, resp_tokens: 100. Tokens/s:
25.55855159205317
temp 0.8, resp_tokens: 100. Tokens/s:
19.935016745816352
temp 1.8, resp_tokens: 51. Tokens/s:
12.494154263065434*****
temp 0.1, resp_tokens: 100. Tokens/s:
27.05936416135373
temp 0.8, resp_tokens: 100. Tokens/s:
25.115481524213607
temp 0.1, resp_tokens: 100. Tokens/s:
30.00468565681105
temp 1.8, resp_tokens: 100. Tokens/s:
27.28532513696644
temp 0.8, resp_tokens: 100. Tokens/s:
26.827548654667805
temp 1.8, resp_tokens: 100. Tokens/s:
24.312973515039445
temp 0.1, resp_tokens: 100. Tokens/s:
42.76022641091638

Conclusion: just random for all
possibility: might have hit a fast H100 machine on the last one.

curt.kennedy · July 28, 2023, 6:13pm

While the tokens/second appears random over temperature, I would say that “higher temp” == “more words in the output” (just try with a temp of 2, and see the entire output tokens get maxed out)

So that the overall latency might appear to be longer, as expected.

Maybe that is what the OP is experiencing. But in this case of more words, this is usually a desired outcome.

dr · July 31, 2023, 12:52pm

Thank you very much. This seems to be it! I ran a dedicated analysis with a pretty long preprompt and with temperature>1.4 responses not only get longer…

dr · July 31, 2023, 12:53pm

… and take longer to generate…

dr · July 31, 2023, 12:54pm

… but also errors become more numerous:

dr · July 31, 2023, 12:57pm

These errors are mostly 502 errors:

Topic		Replies	Views
Gpt-4-0125-preview is slower than gpt-4-0613? Feedback gpt-4 , api	5	5562	January 30, 2024
Unstable speed of gpt-3.5-turbo-16k API api , gpt-35-turbo-16k , performance	6	1097	January 9, 2024
Observing discrepancy in completions with temperature = 0 API	9	17469	February 6, 2024
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	12	9568	July 22, 2024
Run same query many times - different results API	11	8018	December 21, 2023

API calls slower for higher temperature?

Related topics