Does response/generation time of gpt 4 depends on size of input prompt?

aman.rai · May 30, 2023, 11:12am

Hi, this is in relation to the response/generation time for gpt-4. I am mostly concerned with the performance related to api calls. This has 2 parts.

Q1 :
While streaming, does the time to get the response of first generated token is related to the size of input prompt?

Q2 :
Lets say the model generates 100 tokens for 2k input tokens, will it take more time to generate the same 100 tokens for 10k input tokens?

I know that should typically be the case, but i want to know others views as well on this.

aman.rai · May 30, 2023, 11:54am

my bad, I have updated the question for you

Foxalabs · May 30, 2023, 12:50pm

There is a delay related to the size of the input token size, although this is typically only a fraction of the total response time. I do not have access to the 32k model to build any sort of data based model of the time taken, but from the 4k and 8k models of 3.5 and 4 the increase seems small, on the order of a second or so for max token contexts.

Topic		Replies	Views
GPT-3.5 and GPT-4 API response time measurements - FYI API	19	38671	February 6, 2024
Benchmarking response time for GPT4 by context+output tokens API gpt-4 , api-speed	6	6980	November 3, 2023
GPT 4 API taking more time to render things asked through prompts API gpt-4	1	549	September 14, 2023
GPT 4 API is Very Slow Still API gpt-4 , chatgpt , api	15	6851	December 16, 2023
Gpt-4o-mini is really slow API gpt-4o-mini	6	3010	March 18, 2025

Does response/generation time of gpt 4 depends on size of input prompt?

Related topics