It has been previously observed that the response time increases with the max_tokens
param. However, it’s hard to benchmark because it also depends on the number of requests hitting the API, meaning that the same request will take lesser time for on a day with relatively lower traffic and vice-versa.
Also, given how the number of customers is always increasing and that OpenAI keeps upgrading their infra, benchmarks wouldn’t mean much for development except API performance over time.