GPT-4 is faster again these days

Given all the complaints one or two months ago about gpt-4 inference speed, I thought I’d note that, lately, gpt-4 has been running faster on average than back then.
I don’t know if this is “different roll of the die” or “die-off in load” or “additional provisioned capacity” but, whatever it is, I’ll take it, and thank whoever needs thanking :slight_smile:


How fast it is? I mean how many seconds it takes to generate articles?

Generation is directly proportional to number of tokens generated, because each iteration to generate a token is one evaluation through the model.

It’s still clearly slower than gpt-3.5-turbo, but the 40 seconds for a paragraph we used to see (probably queuing time?) are now largely gone.