Assistant API Performance is Very Slow

jacktheporsche · December 13, 2023, 11:28pm

Currently I’m working on a chatbot using the assistants API, and the performance is incredibly slow, with or without files and instructions. The simplest question can take up to 6 seconds, simply saying are you active, meanwhile a more complicated question can take up to 30s to receive an answer. I was concerned I was rate limited since I was only tier 2 on my account, so I tried creating a new account and funding it to start with a clean slate and I got the same results. Is the assistants API just broken now?

BoldChicken · December 14, 2023, 12:27am

I upgraded yesterday our account from Tier1 to Tier 3. It made no difference. It is incredibly slow with huge latency and ongoing errors with the code analyzer. We have some demos of our API based product scheduled for early next week and it is just embarrassing how slow it is.

That being said, overall speed seems a little better when using model gpt-4 instead of gpt-4-1106-preview.

jacktheporsche · December 14, 2023, 12:55am

Similar situation here, I just finished making this thing feature complete and it’s just a joke, the responses are good, but in 25-30s I’ll get laughed out of the room if I tried to show it off as a complete product. The only hope I have is seeing if I can have it show the text as it’s being generated as it might be able to improve perceived response time.

supershaneski · December 14, 2023, 2:32am

This might help you guys to check before your presentation. You can probably switch the model or time your presentation when it is fastest.

5hirish · January 3, 2024, 10:19am

So instead of using gpt-3.5-turbo to see significant gains, one should use gpt-4-turbo ? That’s gonna be heavy on the bills

Foxalabs · January 3, 2024, 10:21am

GPT-4-Turbo is almost the same price that GPT-3.5-Turbo used to be…

CinematicDev · January 3, 2024, 6:56pm

Whats interesting is I don’t think it’s all that slow. I’ve noticed that it’s significantly worse when the text is large versus when it’s small. This is likely due to the lack of streaming responses. I ended up migrating to llama-index and setting up an agent with it’s own custom RAG instead of using open-ai.

Cristian74 · January 4, 2024, 4:56am

I find as more runs are done on the same thread thus building up messages and context, the response times increase. Still useable for me though. I just don’t let threads get too long, I create new one’s for each user session.

BetaTester · January 14, 2024, 7:17pm

Sadly, this is a definite issue. None of my responses is less than 17+ seconds and a much as over 60 seconds. I did an MVP demo today and response was great with but for the obvious comment as to how slow it was.

Appreciate we are still in beta. Is there a roadmap that could be shared on the timeline from Beta to Live. Delivery of my product completely depends on when this will go live and ideally. Rightly, or foolishly I assuming there will naturally be an in performance (I could be in trouble here).

Not wanting to look for alternatives at the moment. I’m using the Wisper endpoints and that’s a little slow at the moment. Was using the ASR and TTS but subbed in a local ARS which is quick and working well. But still using the Wisper endpoint for TTS. Not wanting to do a local solution unless absolutely necessary. Typical responses here varies but for my use case is around 7+ seconds which is not great!

For my use case I need overall query response + TTS to be between 5-10 Max. No idea if that’s achievable with Open AI. Anyone? (appreciate there are a number of factors in play)

BetaTester · January 17, 2024, 4:32pm

In way of an update, switched to a local TTS solution which is fast and works great. So, the only slow part of the process is now GPT4 which I need for its great reasoning ability. None of the other models quite does what’s required. So unfortunately, slightly blocked at the moment.

Does anyone have experience of the Microsoft Azure versions from a speed/response perspective?

simonarcher · March 7, 2024, 9:54am

Would you mind sharing what local TTS solution you started using? Having the same pain with latency from OpenAI TTS, even when streaming the response.

Topic		Replies	Views
20, 30 sec assistants API answer Feedback api , assistants-api	11	712	February 21, 2025
Assistants API Performance API api , assistants-api	11	2909	March 21, 2024
GPT 4 API is Very Slow Still API gpt-4 , chatgpt , api	15	6807	December 16, 2023
Why Assistants API is Slow? Any speed solution? API api-speed , openai , rag , assistants-api	15	9189	September 10, 2024
Assistants API super slow API assistants-api	4	388	March 6, 2025

Assistant API Performance is Very Slow

Related topics