OpenAI Why Are The API Calls So Slow? When will it be fixed?

This is a disaster! I went into this OpenAI APIs thing with so much enthusiasm. And then I see the timing for a simple call is over 8 seconds. I am using Netlify Serverless functions and they have a 10 sec limit on a function. Enthusiasm levels dropping off faster then OpenAI response…

1 Like

Welcome to the forum,

You can use the streaming option if you need to pass at least something quickly, AI services are a new kind of information source, it is going to take time for other services to offer longer timeouts.

Thank you. Maybe I was too quick to judge… I used text-davinci-003. I replaced it with -002, the response now takes around 1-2 secs.

1 Like

Great news, yes the lower order models do less computation and will be faster, the entire information Technology industry is taking time to accommodate the new …and slightly slower AI technology, things will improve with time.

1 Like

I can not recreate anything that takes over 2 seconds regardless of model unless I’m making it read whole pdfs or something… aichat is really fast … but I’m running from an android phone so it has to be in the code or connection not device

Some users have reported slow response times with the OpenAI API. It seems to be a capacity issue, and the free tier may have tighter rate limits. Implementing streaming may give the impression of a faster response. Lower-order models like -002 can provide quicker responses compared to higher-order models like -003. The industry is still adapting to the new AI technology, so improvements are expected over time.

Last few days have been painfully slow. Tons of time outs. I’m paying $20 a month for it to be crazy slow??

1 Like

Use ‘gpt-3.5-turbo-16k-0613’ rather than ‘gpt-3.5-turbo’

1 Like

How is the response different when you add stream:true?

Update: Actually, I asked GPT-4 Codex the question and it responded:

Handling a streamed response requires you to handle the incoming data incrementally rather than all at once. The cURL library in PHP allows you to specify a callback function that will be called as chunks of the response are received.

Here’s a modified version of your function that can handle a streamed response by using the CURLOPT_WRITEFUNCTION option:

public function solrai_executeCurlRequest($url, $headers, $data) {
  $curl = curl_init();

  curl_setopt($curl, CURLOPT_URL, $url);
  curl_setopt($curl, CURLOPT_POST, true);
  curl_setopt($curl, CURLOPT_POSTFIELDS, json_encode($data));
  curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
  curl_setopt($curl, CURLOPT_RETURNTRANSFER, false); // Changed to false to handle chunks manually

  $response = '';

  curl_setopt($curl, CURLOPT_WRITEFUNCTION, function ($ch, $chunk) use (&$response) {
    $response .= $chunk;
    return strlen($chunk); // Return the number of bytes handled

  $startTime = microtime(true);

  $result = curl_exec($curl);

  $endTime = microtime(true);
  $elapsedTime = $endTime - $startTime;
  $this->elapsedTime = $elapsedTime;

  $error_msg = null;
  if (curl_errno($curl)) {
    $error_msg = curl_error($curl);


  // Your existing logging and error handling here...

  if ($error_msg) {
    return ['error' => $error_msg];

  $responseBody = json_decode($response, true);

  // Append elapsed time to the response body
  $responseBody['_elapsed_time'][$url] = $elapsedTime;

  return $responseBody;

Note the following changes:

  • CURLOPT_RETURNTRANSFER is set to false to indicate that we will handle the response data manually.
  • A callback function is defined using CURLOPT_WRITEFUNCTION to handle each chunk of the response as it is received. The callback appends each chunk to the $response variable.
  • The response is handled incrementally in the callback, so the $response variable is available with the full response data after curl_exec completes.

This modification should allow your function to handle streamed responses. Make sure to test thoroughly, as handling streamed responses can be more complex than handling standard responses.


Nope, in my experience it is all the same wether you pay or you don’t.

composer require openai-php/client

You are welcome.


Hello, I’ve noticed that GPT-3.5 Turbo has been responding very slowly for the past two days. It’s taking over a minute to process 1000 tokens. I’m wondering if this problem will continue or if it will be resolved soon. Has anyone else experienced this issue recently, and if so, how often? Any solution?

1 Like

Seems typical to me:

Well now, top o’ the mornin’ to ya! I’m Mac o’Paddy, the jolliest leprechaun
ye’ll ever meet. I’m a wee bit mischievous, but always with a heart full o’
gold. I’ve been wanderin’ these green hills of Ireland for centuries, guardin’
me pot o’ gold at the end of the rainbow. So, what brings ye to me humble abode

[stop] 60 words/95 chunks, chunk 1 in 0.502 seconds
95 tokens in 2.9 seconds. 33.3 tokens/s @ rate: 40.4 tokens/s)

more chat…


##>Where can I catch my own leprechaun?
Ah, catchin’ a leprechaun, is it? Well, I must say, we leprechauns are a
clever bunch, always dodgin’ and weavin’ to keep our treasures safe. But if
ye’re truly determined, I’ll give ye a wee bit of advice. First, ye’ll need
to be as sly as a fox and as quick as a hare. Leprechauns are known for
their craftiness, so ye’ll have to outsmart 'em.

Now, keep yer eyes peeled for a rainbow on a rainy day. Follow it to the
end, where ye might just find a leprechaun’s hideout. But be warned, they’re
mighty tricky. If ye manage to catch one, ye must never take yer eyes off 'em,
for they’ll try every trick in the book to escape. And remember, treat 'em
with respect, for they’re magical creatures with a mischievous streak.

But let me tell ye, me friend, the real treasure lies not in catchin’ a
leprechaun, but in the joy and laughter they bring. So, enjoy the hunt, but
don’t forget to appreciate the magic that surrounds ye every day.

[stop] 175 words/249 chunks, chunk 1 in 0.417 seconds
249 tokens in 6.5 seconds. 38.2 tokens/s @ rate: 40.8 tokens/s)

##>Tell me a story of yore, from the bygone times of the enchanted forest. 8 paragraphs.

Ah, gather 'round, me friend, and I’ll spin ye a tale from the enchanted
forest of yore. Once upon a time, in a land far, far away, there lived a mischievous
leprechaun named Seamus. Seamus was known throughout the forest for his
clever pranks and his love for shiny things.

One fine morning, as the sun peeked through the emerald leaves, Seamus
stumbled upon a hidden glen. In the heart of the glen stood a majestic oak
tree, its branches reaching towards the heavens. Seamus felt a strange pull,
as if the tree held a secret. He approached cautiously, his eyes twinkling
with curiosity.

As Seamus drew closer, he noticed a tiny door nestled within the trunk of
the oak tree. With a mischievous grin, he whispered the secret words he had
learned from an ancient fairy. The door creaked open, revealing a magical
realm beyond his wildest dreams.

Inside, Seamus discovered a bustling community of woodland creatures.
Fairies danced on petals, gnomes tinkered with their inventions, and talking
animals shared tales of wisdom. Seamus was enchanted by the beauty and
harmony of this hidden world.

One day, as Seamus explored deeper into the enchanted forest, he stumbled
upon a shimmering pond. Its waters sparkled with a thousand colors,
reflecting the magic of the surrounding trees. Seamus dipped his hand into
the pond and felt a surge of energy coursing through his veins.

Little did Seamus know, the pond held the power to grant wishes. With a
mischievous glint in his eye, he made a wish for endless pots of gold. But
the forest, wise as it was, knew that true happiness lay not in material
wealth. So, instead of granting his wish, the pond bestowed upon Seamus a
heart filled with kindness and a spirit of adventure.

From that day forward, Seamus used his newfound magic to spread joy and
laughter throughout the enchanted forest. He played pranks that brought
smiles to faces, and he shared his treasures with those in need. The forest
flourished under his care, and the creatures within it lived in harmony.

And so, the tale of Seamus, the mischievous leprechaun with a heart of
gold, became a legend in the enchanted forest. To this day, his spirit lives
on, reminding us all that true magic lies not in what we possess, but in the
love and kindness we share with one another.

A minute for 1000, you say?

[stop] 398 words/507 chunks, chunk 1 in 0.470 seconds
507 tokens in 13.2 seconds. 38.3 tokens/s @ rate: 39.7 tokens/s)

Are you making it think too hard?

You want slow? How about the time to spin up my fine-tune after inactivity?

[stop] 64 words/101 chunks, chunk 1 in 15.994 seconds
101 tokens in 17.1 seconds. 5.9 tokens/s @ rate: 89.8 tokens/s)

but after loaded it can burst out tokens:

[length] 386 words/512 chunks, chunk 1 in 0.216 seconds
512 tokens in 6.1 seconds. 83.3 tokens/s @ rate: 86.3 tokens/s)

1 Like

But I am not getting that quick response even in the playground. I am just asking about some simple stuff like paraphrasing etc. I just checked on colab . It took around 1 min 30 sec to generate a response of approx 600 words.

1 Like

How to solve the problem of error after more than 60 seconds

1 Like

See if it is a problem with your “print” and environment.

Here’s simplified (well, simplified from what I demonstrated earlier) example code for probing times, first set things up:

import openai
import time
import json
openai.api_key = key  # "sk-1112222...", or import os + import env variable

create = {"model": "gpt-3.5-turbo", "top_p": 10e-9, "messages":
 [{"role": "system", "content": "You are a helpful assistant."},
 {"role": "user", "content": "write an essay on "
 "digital transformation, 100 paragraphs."}

#Then let’s try at different length responses, so you can also evaluate network latency.

for max_tokens_setting in [1,100,1000]:
  start = time.time()
  max_tokens = {"max_tokens": max_tokens_setting}
  openai_object = openai.ChatCompletion.create(**create, **max_tokens)
  openai_data = json.loads(str(openai_object))
  done = time.time() - start
  tokens = openai_data['usage']['completion_tokens']
  print(f"[{tokens} tokens in {done:.1f}s. "
    f"{(tokens/done):.1f} tokens per second (with latency)]")

I’m receiving the response into a string, so there’s no other hokey code delaying measurement.


[1 tokens in 2.2s. 0.5 tokens per second (with latency)]
Title: Embracing Digital Transformation: Revolutionizing the Way We Live and Wor
[100 tokens in 3.2s. 31.0 tokens per second (with latency)]
Title: Embracing Digital Transformation: Revolutionizing the Way We Live and Wor
[1000 tokens in 20.3s. 49.3 tokens per second (with latency)]

1 Like

I don’t think it’s a problem of print and environment. I tried your code; here is the result.

The response is quite slow. I am consistently experiencing this for all my requests. Even in the playground, I have a similar experience. I do not know what to do now. It has been more than two days.

1 Like

We’re trying to implement the GPT 3.5 API, but the latency times from Postman / curl are incredibly slow.

For 270 prompt tokens and 299 completion tokens (569 total), it is taking:

34 seconds

The same prompt through Chat GPT 3.5 is about 1 second.

The test prompt is simply the definition, synonym, and entomology of a word we passed to it.

Is this a known issue? Seems incredibly slow…

1 Like

There’s a few things that could be happening:

  • platform: run python locally and see
  • datacenter: you could be routed to a slower one by geography
  • account: different accounts, different levels? (unlikely)

I hit DNS servers around the globe and got the same IP for, and they don’t advertise a so you can get to a particular api endpoint (if there’s even more than one).

Azure has multiple datacenters where you specifically deploy your OpenAI instance.

Then I suppose second account, feed it $5 and see if you are discriminated against. Or if it’s my monthly billing and history that gets me to fast machines.

Or fine-tune for 4x speed from what you’re getting at 8x the cost…

BTW, my previous speed test code seemed wordy and expository, so I took care of that:

import openai; from openai.util import convert_to_dict as e
from openai import ChatCompletion as f; from time import time as q
openai.api_key = "sk-1234"
def g(z):
 return [z['usage']['completion_tokens'],z['choices'][0]['message']['content'][:80]]
[{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"write an article on digital transformation, 10000 words."}]}
for m in [1,128,512]:
 s=q();m={"max_tokens": m};o=e(f.create(**c,**m));d=q()-s;x=g(o)[0]
 print(g(o)[1]+f"\n[{x} tokens in {d:.1f}s. {(x/d):.1f} tps]")
1 Like

Yes, I am experiencing the exact same issue with my requests. In the last 2 days, the response times have shot up. This is a serious problem as the provider hosting our server has a 30-second timeout policy and most of our requests are failing. We will need to change host provider for sure but in the short term is this a known issue? We haven’t experienced this issue to the same extent in the 6-7 months of using this API