Let's compare our API speed? It's too slow!

Hi,

There are so many topics with complains regarding API is much slower than Playground. And there are no useful answers yet.

Let’s compare our numbers?

I get these timings for text generation:
GPT-4: 36 sec
GPT-3.5: 11 sec
GPT-3: 5 sec

curl requests for Unix console:

GPT-4

time curl -s https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-YOUR_API_KEY" \
  -d '{
    "model": "gpt-4",
    "max_tokens": 1000,
    "messages": [{"role": "user", "content": "Write text about cats"}]
  }'

GPT-3.5-TURBO

time curl -s https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-YOUR_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "max_tokens": 1000,
    "messages": [{"role": "user", "content": "Write text about cats"}]
  }'

GPT-3:

time curl https://api.openai.com/v1/engines/text-davinci-003/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"prompt":"Write text about cats","max_tokens": 1000}'

P. S.: maybe some official guy from OpenAI could give us any advice?

It might be a time-thing, but in general, keep a lookout at https://openai-status.llm-utils.org/ .
It’s an unofficial tracker for GPT speed and can give you a fair idea about the status of the server at the point in time.

4 Likes

That is pointless without seeing how many tokens were generated.

Report for 5 trials of gpt-3.5-turbo:
For total response time | Min: 1.634, Max: 4.0, Avg: 2.59
For latency (ms) | Min: 501, Max: 2823, Avg: 1492.20
For response tokens | Min: 50, Max: 50, Avg: 50.00
For total rate | Min: 12.5, Max: 30.6, Avg: 21.18
For stream rate | Min: 42.48, Max: 52.46, Avg: 45.67

Report for 5 trials of gpt-3.5-turbo-16k:
For total response time | Min: 1.801, Max: 2.368, Avg: 2.08
For latency (ms) | Min: 295, Max: 1105, Avg: 763.80
For response tokens | Min: 50, Max: 50, Avg: 50.00
For total rate | Min: 21.11, Max: 27.76, Avg: 24.24
For stream rate | Min: 33.2, Max: 42.41, Avg: 38.25

Report for 5 trials of gpt-3.5-turbo-0301:
For total response time | Min: 1.867, Max: 2.251, Avg: 2.11
For latency (ms) | Min: 500, Max: 793, Avg: 670.80
For response tokens | Min: 50, Max: 50, Avg: 50.00
For total rate | Min: 22.21, Max: 26.78, Avg: 23.83
For stream rate | Min: 33.69, Max: 36.57, Avg: 34.82

I don’t think this is a time-thing, I’ve never got numbers you refer to.
Current result for me is 46/9/3 - always 2 times longer than in the statistics behind the link.
I agree with my customers, that claim my service being unusable.
Currently I’m in search for alternatives to OpenAI (unfortunately).

It took 1.5 minutes to generate response for mere 11 tokens.
The API has become miserably slow in the last few days.

Same for me. If I do a call on the playground it’s really fast, same call with same prompt with the api it’s much slower.

Well, the playground is calling the same models that the API does, it’s just a wrapper to the API.

Might be worth looking at your code base, updating your libs and investigating your networking situation.

This is not true.

Technically it could be just a UI for API, but just look at this forum - there are so many topics about there is a significant speed difference between them.

I agree that there have been a number of forum users who have experienced a reduction in performance, and there is an announcement that users with low usage or new account may not be on the the lower latency servers. That still does not rule out a local infrastructure issue and it is worth checking.

Unfortunately, the developer forum cannot investigate issues at this level, you will need to reach out to help.openai.com to leave your details and your issue.

help.oprnsi.com is useless. Closed the ticket saying my issue is resolved but there’s still too many 500 errors. The API is now basically too slow to be useful