Why Assistants API is Slow? Any speed solution?

muhammadhamad913 · December 14, 2023, 6:30am

I have tried “Retrieval” tool from OpenAI Assistant API which is to slow. It takes (4 - 8 seconds) for a short prompt and response, (7 - 16 seconds) for a long prompt and response.

Assistant details:

Model: gpt-3.5-turbo-1106
No. of files: 1 (.docx)
File size: 23.3 KB
No of pages in file: 10 pages (2993 words)

Is there something fundamental (like reading documents) that make assistants slower? Or is this just due to it being new? Or is there any way to speed it up?

kishorekumard01 · December 14, 2023, 7:23am

It is because the assistant api is under development.

muhammadhamad913 · December 14, 2023, 7:44am

Yes you are right, it is still in beta version. Have you try anything to enhance its speed?

kishorekumard01 · December 14, 2023, 7:57am

No I did not use anything to enhance the speed. But I was looking to use this assistant for generating quiz based on the pdf that I upload and release it as an api and it did not work properly

steve33 · December 14, 2023, 7:57am

It retrieves information from the file you uploaded. That’s why it’s slow.

_j · December 14, 2023, 8:00am

Fast way: extract document to plain text yourself. Include as a RAG assistant message after “system” or before user question. See a stream of chat completion response within a second.

Slow way: use another’s service that puts the decoding and access to information behind an embeddings or function. Don’t see anything until you see the response is status:done and then retrieve it.

hollinwakefield · December 20, 2023, 11:11pm

One other thing to keep in mind is that streaming makes the Chat Completions API feel faster, so streaming being absent from the Assistants API is likely one contributing factor to it feeling slower.

raymondyeh · December 21, 2023, 1:23am

Given your use case you might be better off using the regular chat completion API and passing along your document in each request. Your word document can fit into the context window for the chat completion.

You will have finer control over what is being sent into the context window as well as getting the instant streaming response.

gemini555 · June 18, 2024, 11:11pm

when would i use chat completion vs assistant api? I want my chatbot to answer questions from my knowledge base but every query is taking 6-10k tokens which is too high. How do i optimize for a lower token + speed below 5-7 secs?

Cristian74 · June 19, 2024, 10:04am

have you used assistants api v2 with streaming? if that’s too slow, maybe try chat completions + RAG instead of assistants.

davidg707 · August 1, 2024, 10:08pm

To quantify ‘slow’: I just used the assistants API to ask gpt-40-mini “I need to solve the equation 3x + 11 = 14. Can you help me?”.

It took four minutes to respond.

I waited a few minutes and ran it again, the second time it was done in 28 seconds.

I’m not using tools or vector stores or any other messages.

I don’t mind slow for a beta product, but this sort of speed makes it pretty hard to actually even experiment with the thing.

marko-1 · August 20, 2024, 7:16pm

Is anyone experiencing awfully slow GTP4o speeds today? My assistants are taking several minutes to complete simple and short text based requests.

merefield · August 20, 2024, 7:30pm

I just used my gpt-4o-mini powered chatbot and it returned in about 2 seconds with the correct answer (first time)

So I believe the answer to this is “use Chat Completions” and local functions if you want performance.

davidg707 · September 3, 2024, 1:55am

I wrote a little page to check the response time (hourly) to make it easy to check back and see if it’s fast yet.

Right now it’s consistently around 2 seconds for the fastest-possible call.

nossonweissman1 · September 10, 2024, 12:55pm

I’m seeing the same issues without adding documents to the assistant

mano1 · September 10, 2024, 5:37pm

It is taking 10 to 15 seconds for a two line response, this is just a normal chat. Not sure what was the reason. Previously a month back it was giving the same response in 3 to 4 seconds. Not sure what feature addition to assistant, increasing overall response time?

Topic		Replies	Views
Assistants API Performance API api , assistants-api	11	2859	March 21, 2024
How can I make my assistant responses faster? API assistants-api	2	1463	November 3, 2024
How to improve OpenAI Assistants API (File Retrieval) with response time? API assistants	3	2079	April 23, 2025
20, 30 sec assistants API answer Feedback api , assistants-api	11	614	February 21, 2025
1 min+ for assistant API to answer Feedback assistants-api	46	861	March 13, 2025

Why Assistants API is Slow? Any speed solution?

Related topics