Assistants API too slow for realtime/production?

dorottya.hatscher · February 5, 2024, 9:28pm

you’d still get thread functionality if you use instructions/system prompts instead of assistants right?

kwabena.bawuah1 · February 8, 2024, 12:33am

Has anyone noticed the assistant now taking longer than 5 minutes and ending the calls? This is bad. And we are still getting charged for all the fails. This is just bad. the file assistant seems to work faster than just pasting a bunch of text which doesn’t make much sense.

lionkeng · February 8, 2024, 1:41pm

Same experience here. I have an implementation that does function calling but the Assistants API takes a long time to respond. Response time varies from seconds to even minutes.

Younese · February 17, 2024, 1:29am

I’m having the same issue, it’s a simple text input that runs through the assistant instructions, so it’s all text nothing else. going through custom action in a no code solution through a JSON, I first went crazy for days thinking my code wasn’t working, until I realized after a minute, that it magically worked, then I knew it was a delay in response. Is there a fix? Does using GPT3.5 improve it somehow? Thanks

quantum88 · February 21, 2024, 12:39am

3.5 doesn’t improve this. We tried as well. Assistants API unfortunately is a really poor solution right now, as it is not a commercially viable solution, so what is the point…Some generations take 2 minutes to produce a result. If we are lucky, 20-30 seconds. And that is still too long.

Chatnode.ai have managed to solve this somehow though, theirs is lightening fast. If anyone knows how they managed it, please post here, as that would be very helpful. Otherwise we are shelving what we have built for our client until OpenAI fixes the speed.

itsvnk · February 21, 2024, 4:19am

Looking at ChatNode AI Lifetime Deal & ChatNode AI Review it seems that chatnode.ai also uses OpenAi API in the background

Perhaps they don’t use the assistants API but simply use some kind of a vector database with RAG and train their data?

Their implementation looks promising but seems too expensive for me

tomi.sarni · February 21, 2024, 5:05am

I did a very basic use cases in all honest and it is just too slow. For paid service, even if in Beta, should do a little better.

justme_elnart · February 27, 2024, 10:37am

I have been using Assistant for my product for 2 months now and it is getting slower and slower. I only use about 1 - 1.5k tokens for each thread (no extra tools) and it is already too slow. The performance is not stable to the point that my program will time out out of no where and work the next time I test-run it. I hope further improvements will be made because as it is now, I definitely cannot use Assistant for production. Better made do with Chat Completions in the mean time.

cdunn · February 27, 2024, 4:46pm

@ManhNguyen would you be willing to share more about your function calling. I can’t seem to get it to work.

hexarrior · February 27, 2024, 4:59pm

The api is too slow for production as the p99 response time can be over 3 minutes and it does not support stream mode, which is really bad for user experience.

scharleswatson · March 3, 2024, 9:15pm

Best to not use experimental tech, that you don’t control, in production environments.

willer · March 5, 2024, 12:03am

Not sure what happened to the API team, but it seems like they quit or went on vacation. Nothing substantive has happened since the November launches. The assistant API is slow and has frustrating limits, and doesn’t support basic things like streaming.
I had committed to this API, knowing that it increased our vendor lock-in, because OpenAI had done a good job in the past at keeping the pace up. Seems they’ve lost momentum, and as such I’m abandoning the assistant API and backtracking to chatcompletion. This also allows for a much easier transition to competing models like Claude 3 or Mistral Large if we want to go in that direction.

amastaneh · March 5, 2024, 12:17am

@willer, Do “Claude” or “Mistral” have something like the Assistant API, or do we have to build our own boring RAG system and v-database and all that stuff?

pajeronda · March 6, 2024, 7:44pm

Hi everyone
I have the same problem and I am thinking of abandoning these APIs to their fate. For an inquiry this evening I got a reply after 15 seconds (“recognize_seconds”: 15.397772960015573,). I don’t know what to do with a product that, after months is still not up to the consumer market.

scharleswatson · March 6, 2024, 8:48pm

It might be time to try something else unfortunately. Assistants might very well always be “experimental”

tom_t · March 22, 2024, 2:40pm

If you need code generation as well try TaskWeaver (https://microsoft.github.io/TaskWeaver/). It might be considered Assistants replacement (simple RAG, code interpreter, plugins/functions) and you can try to use fast groq inference via LiteLLM (haven’t tried it yet - so please share your experiences if you will). And it’s Microsoft so project won’t be left anyway.

tom_t · March 22, 2024, 2:46pm

Has anyone tried Assistants on Azure? Do you noticed any speed improvements?

UXsniff · March 25, 2024, 2:29am

I was using chat completion and functions calling to get analytics data from internal APIs for a chat assistant to serve my existing platform users. Users basically asking questions like:

“Tell me about my website stats today.”
“Tell me more about specific user given email or user id.”

The chat assistant is smart enough to choose the right endpoint(s) and retrieve the information. It takes about 5~7 seconds to complete which is acceptable.

There are problems where the assistant couldn’t answer or flirt with other questions other than the analytical data. Even if you ask “Are you human?”, the function calling will reply with “None”. I have replace the None response to “I am just an AI chatbot trained with your analytics data…” whenever there is no answer found from the APIs.

Then I switched to assistant API, its pretty smart and able to deliver insights from functions calling APIs, also able to answer any generic questions but it takes about 20~30s to complete. I can see that there are hours of major outage for openAI API recently. I am wondering if the infras are overloaded most of the time and the servers are just flooded with tasks and struggling to complete all of them.

I hope the slow issue for Assistants are just temporary or some infra threshold that could be solved over time, not the fundamental design of the Assistant API.

devakumaraswamy25 · March 26, 2024, 1:42pm

@UXsniff
My scenario and experience is similar to yours with Assistant - calling internal apis in the functions… From what I observe(not always but most often) the time that the submitToolOutputs call takes to finish is very long. The size of the output message does not seem to matter( and mine are typically small).

I hope someone from the Assistant dev team follows this community and will respond on this topic with some insight and suggestions on how can be done at the program level.

Ideally they will say “Here is the magic bullet”

Cheers…
Deva

ycdaskin · April 17, 2024, 8:42am

People have been providing feedback on the same issue for a year but not the slightest improvement has been made. in fact, the situation is continuously worsening. Now i can’t get a response in less than 150 seconds. This product is not usable in this condition. As an alternative, Google Gemini works extremely fast. Although it doesn’t have an ‘assistant’ feature you can use it in a similar way with the correct instructions.

Topic		Replies	Views
We proved the API is intentionally slow API	56	17317	May 2, 2023
Gpt-3.5-turbo-1106 is very slow API chatgpt	46	7551	December 19, 2023
Runs randomly take > 30sec Bugs assistants-api	7	315	September 11, 2024
GPT-3.5 Turbo API response is slow API	20	12049	November 11, 2023
Why Assistants API is Slow? Any speed solution? API api-speed , openai , rag , assistants-api	15	6987	September 10, 2024

Assistants API too slow for realtime/production?

Related topics