Slow responce from GPT 3.5 Turbo API

aelfouly · December 16, 2023, 1:53pm

Hi All,
I’m developing an application that depends on communicating moderate size word documents to Gpt3.5 turbo API and receiving a response , the problem is that processing one request takes about 80 seconds which is huge noting that I intend to make this a corporate grade application that would need to process hundreds of documents thus will be too slow for the user , any clue to speed things up ?

NotFenixio · December 16, 2023, 2:21pm

Sometimes the API can be slow due to high demand. You could try using Azure OpenAI, which is approximately x2 faster for half the cost.

If you don’t want/can’t use Azure OpenAI, you could try breaking down your requests into smaller bits. For example, instead of moderating the entire document at once, you could try and break it to, let’s say 5 pieces and then send them to the API.

Additionally, AI services employ a technique called streaming, where the text generates progressively as the process unfolds, creating the effect you see in ChatGPT. It’s more appealing visually and would provide your users a better experience.

aelfouly · December 16, 2023, 2:55pm

Thanks a lot ,I thought OPenAI azure would be more expensive

maurizio.chiaro · December 16, 2023, 2:56pm

Indeed, speed and performance can fluctuate quite a bit. Here are a few tips and reference links you could have a look at:

Data Chunking: For processing large texts, consider chunking them into smaller, manageable pieces without losing context. I remember finding this video pretty useful: video on Data Preparation for LLMs.
Asynchronous Processing: Handle API responses in the background. Asynchronous Programming.
Enterprise Solutions: Have a read of this NVIDIA blog post, with their approach to retrieval-augmented generation apps. blog post.

_j · December 16, 2023, 3:23pm

Hi aelfouly. You have a good question and one that is only hinted at within your account and documentation.

Usage tiers

You can view the rate and usage limits for your organization under the limits section of your account settings. As your usage of the OpenAI API and your spend on our API goes up, we automatically graduate you to the next usage tier. This usually results in an increase in rate limits across most models. Organizations in higher tiers also get access to lower latency models.

What OpenAI doesn’t articulate clearly is “those that haven’t paid at least $50 into their account will now have their organization set to a reduced token production rate, slowing the responses from AI models.”

On that linked limit page, it will show how to increase to the next tier at the bottom (by more prepayment), but won’t say exactly the additional amount needed - you have to do the math yourself.

“Assistants” themselves are slow if you are using that, especially when you are asking the API to process files into text the AI can understand. Then the assistants AI is also slowed, needing to chunk and browse that text if large. There is also no reporting of per-run tokens consumed to let you understand assistants API call costs.

Corporate grade means making your own vector database with knowledge, and using semantic search to return relevant results to a user’s input.

aelfouly · December 16, 2023, 7:52pm

This is really informative , thank you so much

maurizio.chiaro · December 16, 2023, 7:57pm

Have a look to one of the community leaders tips on thanking, references provided in the thread. I’ll link it here:

aelfouly · December 16, 2023, 8:11pm

Thanks , but this talks about the limits not the processing speed , what I’m facing is 80 secs of delay when processing a single request which could be drastic when scaled up to a large number of documents , of course the limits problem will appear then buts not what I’m faacing now , I need to process the request instantaneously the same way chatgpt is processing the request

anon5861895 · December 17, 2023, 1:04am

I did some quick testing using a free-tier account. 3.5-turbo-1106 has decent speed. 3.5-turbo outputs at half speed, which is not as bad as it was in October and November.

To diagnose a speed issue, the first thing to do is go to the playground and compare streaming speed with free ChatGPT.

_j · December 17, 2023, 1:41am

Playground IS your API account and its capabilities.

So you get an evaluation similar to if you programmed it yourself. Programming it yourself would let you measure output speed better though, in terms of time to get the first token, and then the token generation rate after that - when using chat completions endpoint and streaming.

However, I think the overall concern in this topic might be the use of “assistants”, which has not been clarified. Assistants do not stream interactively for user satisfaction; instead you must wait for the entire queued job to be done.

Comparison to ChatGPT? ChatGPT has its own independent qualities. It can show a fast response while your API account is being throttled for being at low payment tier - or ChatGPT can be down and nonfunctional while your API account and model remains in service. They might be testing a model unavailable to you.

Topic		Replies	Views
Why Assistants API is Slow? Any speed solution? API api-speed , openai , rag , assistants-api	16	6485	September 10, 2024
Assistant API Performance is Very Slow API plugin-development , api	10	4637	March 7, 2024
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	18766	November 9, 2023
GPT-4 API to slow when you have to work with a 46 second time out API	11	2651	July 30, 2023
How to speed up GPT4 generation Feedback gpt-4 , chatgpt , api	10	5611	January 29, 2024

Slow responce from GPT 3.5 Turbo API

Usage tiers

Related topics