Chat GPT's API is significantly slower than the website with GPT Plus

Microsoft just dropped the nerf bat. One request every min now. Even though my cognitive bill is $400 so far this month, they restrict me now. They are clamping down on AI agents instead of just expanding infrastructure. Sad. They are limiting this technology and development.

1 Like

is it so hard to people here to realize that they are intentionally limit this to make sure none of you is going to build something big enough before them? this industry is not for you indies and devs.
This is like the hardware chip industry, only the big companies have a place in this.

This is probably a mistake because in weeks, many small startups will start offering this exact capability, don’t forget they invent nothing new, this tech was actually developed outside the US, and most researches are outside the US, it’s just that THEY trained it with the whole internet data, having friends and access to ceo’s of Reddit, Github etc.

ok, i figured out. everything works like expected and is fast. the answer is: stream
and i was wrong. stream is working also with chat api.

1 Like

What I don’t get is why they don’t just raise prices for the API to something more demand/supply based. Cloud does this and it works just fine. As new resources come in, they can lower prices.

Leave the web interface alone, of course, so people can play around, but if people have high value cases which don’t use a lot of tokens, they should be able to use the API - imho.

1 Like

Mind explaining further what you mean by ‘stream’? How did you manage to make the API respond quickly? Thanks.

1 Like

Surprisingly, it seems that with the API, GPT-3.5 Turbo, GPT-4, and Text-Davinci-003 are slower than Text-Davinci-002 recently. My company account’s logic was so slow with GPT-4 that I tested it directly using my Postman call. Apparently, all of them are slower than Text-Davinci-002, even with the same prompt and Text-Davinci-002’s longer text completion.

However, Text-Davinci-002’s accuracy is not good enough for my use case. Therefore, even though it is faster than the other models, I cannot use it. :disappointed_relieved:

Can anyone else try Text-Davinci-002 instead of the other models and see the speed?

1 Like

search for “stream” on this site


Any news or solution about this?
In ChatGpt plus I am having 8 times more token than compared to the API (gpt 3.5 turbo) in the same time.
This is affecting our project a lot. :slightly_frowning_face:

1 Like

The underlying implementation can summarize context and previous history to keep the thread of older conversations without needing to spend as many tokens.

It may also be helpful to only include user text, not completed bot text, in the context. It seems perfectly capable to keep the thread anyway when doing this.

That being said, models with less context will be faster, because the cost of the model goes up with approximately the square of context. It also goes up by a constant factor related to the number of parameters. Doing less work, will run faster, all else being the same.

Same here. I tried to buy the subscription but the API response time didn’t change it. With this reponse time API are not usable!


Having same issues on /chat/completion endpoint responses with or without stream option, both gpt-3 and gpt-4 are extremely slow. Used Postman to test and even OpenAI Playground response times are also very slow.

1 Like

Same issue here, I’m using 3.5turbo. Is there any way to have a response from the support team? We all need to understand whether the time spent on developing can be repaid in the near future and we can offer a good service to our customers or not.


Just adding my +1 here in hopes that that helps get attention to this problem. I have to admit, I am a bit shocked that the paying members get less server preference then the free members? Seemingly. Doesn’t make fundamental sense to me?

The same issue. I’m using text_davinchi_003, and the average response latency is 45 seconds with a token limit of 500. What’s interesting is that when doing requests in one thread, the initial response is typically received in 10-15 seconds, but each subsequent request adds an additional 5-15 seconds to the latency time until the last one get back with error. Doing delay (up to 2 sec) between the requests doesn’t make effect.

It seems like the delay is related to the “user” identifier passed in the request. I began to observe a gradual decline in response speed, which worsened as time went on. Eventually, it reached a point where it became extremely poor, leading me to believe that users would soon start expressing their dissatisfaction. Surprisingly, however, I haven’t received any complaints yet. In short, through deduction, I realized that requests using a new user identifier yielded much faster results compared to the one I had been using. I’m curious to know if others have encountered a similar situation.