The main bottlenecks that slow down the API response

Hi,

I am using different text/image models via OpenAI API. I notice sometimes the response is very slow, and want to find out a solution to speed up it.

So, I wonder what is the main bottlenecks that slow down the response.

Note, I must use gpt-4 and dalle-3 models.

I figure out the main factors may be:

  1. The length of the prompt sent to OpenAI.
  2. The requests I made in the prompt. For example, if I have several requests, then that will slow down the response.
1 Like

It’s mostly your rate-limits tier + size of prompt(s) + network health… I believe they’re working on ways to make it even faster.

Other things you can do is try to use a smaller/faster model. While DALLE2 won’t replace DALLE3, you can get by with GPT-3.5-turbo or gpt-3.5-turbo-instruct sometimes with clever prompting and a one-shot or two-shot …

I can’t seem to find details on latency for each tier, any idea where that info might be?