Benchmarking response time for GPT4 by context+output tokens

sps · November 3, 2023, 5:12pm

It has been previously observed that the response time increases with the max_tokens param. However, it’s hard to benchmark because it also depends on the number of requests hitting the API, meaning that the same request will take lesser time for on a day with relatively lower traffic and vice-versa.

Also, given how the number of customers is always increasing and that OpenAI keeps upgrading their infra, benchmarks wouldn’t mean much for development except API performance over time.

Topic		Replies	Views
GPT-3.5 and GPT-4 API response time measurements - FYI API	19	36754	February 6, 2024
Gpt-4o tokens per second comparable to gpt-3.5-turbo. Data and analysis API gpt-4 , gpt-35-turbo , playground , gpt-4-turbo , gpt-4o	3	12048	August 16, 2024
Does response/generation time of gpt 4 depends on size of input prompt? Community gpt-4	2	2606	May 30, 2023
Gpt-4-0125-preview is slower than gpt-4-0613? Feedback gpt-4 , api	5	5556	January 30, 2024
What's the best way to benchmark tokens/sec of fine-tuned model? API fine-tuning	2	3726	September 21, 2023

Benchmarking response time for GPT4 by context+output tokens

Related topics