We proved the API is intentionally slow

It’s much like Steve Jobs once said, you’ve got two options: use it or don’t!

There are always alternatives and I see so many people getting stressed over slowdowns/performance which is understandable but they need to remember this is all beta and could be pulled away at any moment.

I saw a developer going nuts about the service being down, presumably because he was running a paid service on it but it feels like people are building their houses on sand by relying on a beta platform and reselling it as production ready code

Take it as it is and if you don’t like it, try one of the many many self hosted options. Or as one other person suggested, apply for an Enterprise account via Azure. Until then, I’m afraid that they are free to prioritise their own service or slow down the API whenever they want. There is no SLA so we know what we’re getting at least

3 Likes

I"m an enterprise customer on azure with the 32k model. Its 404ing currently.

2 Likes

Interesting to hear - do they offer any kind of SLA on this package? They are supposedly on different hardware so curious if any outages are linked

I’m starting to think there is something in all this and to see it happening on azure as well makes you think they are preventing people from going too far. Time to try other models!

No SLA, its a preview. Understandable, but my azure bill JUST for GPT-4-32k is so far $800 this month. I would suspect $400 of that is timeouts.

1 Like

Perhaps they should introduce two tiers: a budget tier for people who care about cost-savings more than speed, and a premium tier with a guaranteed express quality-of-service for mission-critical use-cases. Like we do with online, near-line, and cold storage buckets.

That’s criminal!

(must be at least 25 characters)

Does GPT-4 on their website (chatGPTPLUS GPT4 right?) is faster than GPT-4 on their API? But chatGPTPLUS is using which GPT4 version? GPT-4 or GPT-4-0314 or etc? That could cause a speed difference, especially if is chatGPT (GPT3.5) and not GPT4 series.

I notice GPT-4-0314 is slow after using it a lot (paid), compared to chatGPT3.5 on website (free). But of course GPT-4 is much more powerful. About from my memory maybe 10 times slower? Like instead of jndfnsdfkjnsdff it is jdsfd

I have a question. Does chatGPT(PLUS) allow you to though select which GPT-4? Ex. GPT-4-0314? If not, maybe the bigger mommas are on the API and are smarter hence take longer :stuck_out_tongue: Just a thought. Maybe.

I completely agree with you, i have the same feeling that altman, since it was not his idea of the chat box, the NPL come out from an intuitions made on facebook, now they are scared to get over from a competitor. We are talking of the next google company. That’s why Microsoft bought it.
They didn’t understand, or they are probably silly to think to dominate the market, just limiting the power of the api. Mr altman i will laugh so laud when i will see you collapsing. You started the company with different ideal then you turn in profits, keeping all the power to yourself. This will race up so much competition that you won’t imagine. Its unusefull to hire hacker’s to attack your possible competitors, steal their information and prototypes using chatgpt. You will do it, and keep on doing it, but it will come to an end. This power doesn’t belong to you, mr altman. Shall we talk about the limitations you are imposing to users? You invent a car then you complain because it can kill people? How silly is that? Shall we remove from all the library’s how to make a bomb in the house? Shall we censure mcgyver? Your work its done, did you get it? Even if was your idea, the chat box, is like discovering the fire, which people can do so much more, and now you don’t want to distribute the fire to do not let use it from anyone??? What would be chatgpt10? An idiot that has restriction left and right and show only thing the user must buy? They must remove you from the direction of this powerfull company or another one will race very soon, with ethical peoples that don’t crave for money, we are on the edge of a new era and those humas are showing thier worst. 0.0002 per token? You are out of mind!

2 Likes

Here are some of my results.
The setup is:

  1. I have a 3000 token input messages array, that I send to the OpenAI GPT 4 endpoint repeatedly, with a delay between each call to spread things out temporally.
  2. The prompt is identical each time, and the same function is being called on a reasonably stable connection.

Here is the time taken for each response in seconds. A query took 750+ seconds :face_with_open_eyes_and_hand_over_mouth:

image

Excluding Outliers

Raw Data
74.01, 69.77, 64.05, 75.07, 60.33, 66.94, 69.35, 60.39, 69.35, 41.54, 60.39, 69.44, 72.59, 59.13, 54.44, 51.78, 67.16, 46.78, 62.31, 71.04, 462.11, 70.76, 65.33, 66.8, 78.52, 62.84, 54.97, 64.01, 60.12, 47.44, 45.55, 70.76, 85.81, 64.37, 48.98, 57.08, 62.9, 54.19, 59.33, 68.92, 53.82, 56.19, 92.56, 82.92, 72.54, 97.97, 62.21, 55.62, 145.21, 51.86, 77.07, 61.59, 67.27, 48.4, 55.75, 54.2, 47.46, 62.52, 48.82, 59.06, 57.96, 58.84, 68.95, 74.46, 64.64, 56.41, 59.1, 45.67, 64.85, 35.93, 39.05, 67.4, 76.44, 58.63, 87.14, 71.72, 765.41, 77.45, 68.03, 96.79, 85.01, 76.58, 76.63, 63.8, 56.61, 72.19, 63.29, 48.79, 59.92, 84.21, 83.25, 58.71, 56.99, 84.58, 71.19, 63.88, 69.14, 74.72, 56.48, 65.16, 60.64

How is a median time of 64 seconds acceptable?

3 Likes

agree with this proposal
I applied for the waitlist five hours before the release of GPT4, but I still haven’t got the permission
This seriously prevents me from opening up my better products to my users

This makes my rating of openai very low, very bad

Actually if you enable stream=true in the api call, instead of waiting for a full response to come back in json the ChatGPT api can begin streaming you tokens immediately. It’s fairly simple to achieve instant responses with 3.5-turbo model.

prompt=prompt,
max_tokens=max_response_length,
temperature=0.5,
stream=True, # enable stream of words immediately and not wait for json full response

3 Likes

But what is the size of the output (how many tokens)?

This is what I generally experience too with GPT-4. Realize that GPT-4 is still in Limited Beta right now. If your application can’t handle the long delays, you should switch to one of the other models that aren’t in Limited Beta. So switch to DaVinci or Turbo.

I would expect that once GPT-4 gets out of Limited Beta, it will go faster, but right now it’s obvious the demand for the API is outstripping their server capacity. So they have to build this capacity out, which may not be trivial given the specific hardware requirements of GPT-4.

The output is unbounded. With the Chat APIs, the max_tokens argument is optional, and defaults to inf.

Well, I hope so too.
However, it is priced pretty high, and it is in limited beta so presumably the user count is low as well. If OpenAI can’t find a way to scale their infrastructure to cater to the smaller pool, they must have a few particularly good tricks up their sleeve when the gates are thrown open to the wider public.

Having said that, I published the data (acquired at some personal expense, as you might surmise) just to quantitatively validate and support the common perception of slowness.

1 Like

I’ve been reading that after the recent success of ChatGPT, everyone and their dog wants to start their own Large Language Model company and it’s hard to find AI specific servers and server farms. These are in short supply and these need to be built!

But I certainly appreciate your data on the API @vaibhav.garg, it’s good data and is congruent with what I have experienced!

Also a graph over time would be cool, like one dot per day, maybe we could see if things are getting better or worse over time.

That is exactly what I was going to ask if the initial test was done with the partial results enabled!?

1 Like

Thanks Curt.

It just so happens that I have permanently decorated my OpenAI API Call function to log a bunch of stuff (Inputs, Outputs, time taken, exceptions, time when it was run etc.). Now that you ask, i was reminded that I might have that data sitting in my database.

Will try and fish it out!
Cheers.

1 Like

Closing this for now, to answer the OP’s question: No, we do not make the API slow intentionally. I spend north of 10 hours a day, 6+ days a week advocating for and building things for developers inside of OpenAI. We would never intentionally slow things down. The API is not running with the exact same setup as ChatGPT is which is why you see a different response time. You are also likely using a shared engine which makes things much slower and less predictable.

11 Likes