Assistants API too slow for realtime/production?

you’d still get thread functionality if you use instructions/system prompts instead of assistants right?

Has anyone noticed the assistant now taking longer than 5 minutes and ending the calls? This is bad. And we are still getting charged for all the fails. This is just bad. the file assistant seems to work faster than just pasting a bunch of text which doesn’t make much sense.

1 Like

Same experience here. I have an implementation that does function calling but the Assistants API takes a long time to respond. Response time varies from seconds to even minutes.

1 Like

I’m having the same issue, it’s a simple text input that runs through the assistant instructions, so it’s all text nothing else. going through custom action in a no code solution through a JSON, I first went crazy for days thinking my code wasn’t working, until I realized after a minute, that it magically worked, then I knew it was a delay in response. Is there a fix? Does using GPT3.5 improve it somehow? Thanks

1 Like

3.5 doesn’t improve this. We tried as well. Assistants API unfortunately is a really poor solution right now, as it is not a commercially viable solution, so what is the point…Some generations take 2 minutes to produce a result. If we are lucky, 20-30 seconds. And that is still too long.

Chatnode.ai have managed to solve this somehow though, theirs is lightening fast. If anyone knows how they managed it, please post here, as that would be very helpful. Otherwise we are shelving what we have built for our client until OpenAI fixes the speed.

2 Likes

Looking at ChatNode AI Lifetime Deal & ChatNode AI Review it seems that chatnode.ai also uses OpenAi API in the background

Perhaps they don’t use the assistants API but simply use some kind of a vector database with RAG and train their data?

Their implementation looks promising but seems too expensive for me

I did a very basic use cases in all honest and it is just too slow. For paid service, even if in Beta, should do a little better.

2 Likes

I have been using Assistant for my product for 2 months now and it is getting slower and slower. I only use about 1 - 1.5k tokens for each thread (no extra tools) and it is already too slow. The performance is not stable to the point that my program will time out out of no where and work the next time I test-run it. I hope further improvements will be made because as it is now, I definitely cannot use Assistant for production. Better made do with Chat Completions in the mean time.

2 Likes

@ManhNguyen would you be willing to share more about your function calling. I can’t seem to get it to work.

The api is too slow for production as the p99 response time can be over 3 minutes and it does not support stream mode, which is really bad for user experience.

Best to not use experimental tech, that you don’t control, in production environments.

2 Likes

Not sure what happened to the API team, but it seems like they quit or went on vacation. Nothing substantive has happened since the November launches. The assistant API is slow and has frustrating limits, and doesn’t support basic things like streaming.
I had committed to this API, knowing that it increased our vendor lock-in, because OpenAI had done a good job in the past at keeping the pace up. Seems they’ve lost momentum, and as such I’m abandoning the assistant API and backtracking to chatcompletion. This also allows for a much easier transition to competing models like Claude 3 or Mistral Large if we want to go in that direction.

9 Likes

@willer, Do “Claude” or “Mistral” have something like the Assistant API, or do we have to build our own boring RAG system and v-database and all that stuff?

Hi everyone
I have the same problem and I am thinking of abandoning these APIs to their fate. For an inquiry this evening I got a reply after 15 seconds (“recognize_seconds”: 15.397772960015573,). I don’t know what to do with a product that, after months is still not up to the consumer market.

2 Likes

It might be time to try something else unfortunately. Assistants might very well always be “experimental”

2 Likes

If you need code generation as well try TaskWeaver (https://microsoft.github.io/TaskWeaver/). It might be considered Assistants replacement (simple RAG, code interpreter, plugins/functions) and you can try to use fast groq inference via LiteLLM (haven’t tried it yet - so please share your experiences if you will). And it’s Microsoft so project won’t be left anyway.

1 Like

Has anyone tried Assistants on Azure? Do you noticed any speed improvements?

1 Like

I was using chat completion and functions calling to get analytics data from internal APIs for a chat assistant to serve my existing platform users. Users basically asking questions like:

“Tell me about my website stats today.”
“Tell me more about specific user given email or user id.”

The chat assistant is smart enough to choose the right endpoint(s) and retrieve the information. It takes about 5~7 seconds to complete which is acceptable.

There are problems where the assistant couldn’t answer or flirt with other questions other than the analytical data. Even if you ask “Are you human?”, the function calling will reply with “None”. I have replace the None response to “I am just an AI chatbot trained with your analytics data…” whenever there is no answer found from the APIs.

Then I switched to assistant API, its pretty smart and able to deliver insights from functions calling APIs, also able to answer any generic questions but it takes about 20~30s to complete. I can see that there are hours of major outage for openAI API recently. I am wondering if the infras are overloaded most of the time and the servers are just flooded with tasks and struggling to complete all of them.

I hope the slow issue for Assistants are just temporary or some infra threshold that could be solved over time, not the fundamental design of the Assistant API.

1 Like

@UXsniff
My scenario and experience is similar to yours with Assistant - calling internal apis in the functions… From what I observe(not always but most often) the time that the submitToolOutputs call takes to finish is very long. The size of the output message does not seem to matter( and mine are typically small).

I hope someone from the Assistant dev team follows this community and will respond on this topic with some insight and suggestions on how can be done at the program level.

Ideally they will say “Here is the magic bullet” :slight_smile:

Cheers…
Deva

4 Likes

People have been providing feedback on the same issue for a year but not the slightest improvement has been made. in fact, the situation is continuously worsening. Now i can’t get a response in less than 150 seconds. This product is not usable in this condition. As an alternative, Google Gemini works extremely fast. Although it doesn’t have an ‘assistant’ feature you can use it in a similar way with the correct instructions.

1 Like