Benchmarking response time for GPT4 by context+output tokens

It has been previously observed that the response time increases with the max_tokens param. However, it’s hard to benchmark because it also depends on the number of requests hitting the API, meaning that the same request will take lesser time for on a day with relatively lower traffic and vice-versa.

Also, given how the number of customers is always increasing and that OpenAI keeps upgrading their infra, benchmarks wouldn’t mean much for development except API performance over time.

1 Like