Assistants API is too slow!

ncyoung · January 7, 2025, 3:17pm

Love the features of the Assistants API, hate the speed of response. It’s painfully slow by comparison to rolling your own, which for me is orders of magnitude faster to first token - but you then lose file_search, integrated attachments and code_interpreter.

OpenAI moderators / representatives - when can we expect to see something that rivals just using chat completions.

Alternatively, when might we see the file service, file_search and code interpreter as tools for completions?

Would love to hear from others if you have a solution that you’ve developed for this problem?

anon37218972 · January 7, 2025, 4:07pm

hm, shouldn’t really be any slower, are you sure you are using it correctly?

ncyoung · January 7, 2025, 4:40pm

I’m not sure of anything. but I’ve been working with it since first release, and compliant with the documentation.

Are others not finding this an issue?

jlvanhulst · January 7, 2025, 5:08pm

I my experience there are times when it IS indeed much slower. Typically goes away. In general, after using it for over a year now I would say it is very stable/same. Would love to learn a little bit more about your use cases?

ncyoung · January 7, 2025, 5:32pm

Multi-use, custom agent framework.

My baseline is an OAI Assistant based service, which I’ve extended over time to first support completions, which in turn has been extended to support different models from different providers.

The framework is multi-surface for agent interactions (web, mobile app, iOS app and Text messaging).

Lots of integrations and an api endpoint for further extension.

There are ~35 custom function_tools, so it’s a pretty heavy use case.

I do find the Assistants framework sometimes becomes VERY slow and then recovers, but it is generally much slower - in general when I benchmark completions vs. assistants the completions return tokens <1s where are the Assistants consistently take 3-5 seconds for first token return.

jlvanhulst · January 7, 2025, 5:51pm

Interesting observation.
I run about 30 different Assistants with 30 different functions, non of them require ‘realtime’ - so really only notice the difference when I am working/testing on them. I assume there is an update in the air for supporting things like o1 that will bring them up to par again. With as many functions as we have I find completions so much harder to manage than Assistants.

ncyoung · January 7, 2025, 8:55pm

I have a good framework (out of necessity) for completions, but the OAI file/RAG service is magical and code interpreter is very cool too, so would love simply for assistants API to just be faster It really shouldn’t be as slow as it is.

shounakswipe · January 8, 2025, 6:03am

This is so true and frustrating at the same time. It seem to have gotten worse recently, when sometime it just stops working and keeps in “run” state for even a min. Got complaints of a few customers about it.

But the next time we try, it is working in the normal state.

The issue is it automatically gets worse and then recovers, but the openai status page also does not have any degraded performance notification nor is there any other communication from them.

Bottomline is using Assistants API is production, is really risky because of these temporal disturbances.

Do you agree?

xaibu · January 8, 2025, 2:47pm

I have an assistant with file search can give me an answer up to 1-10 minutes (it was like that yesterday and today). It works fine without it. This is a problem because you have to flip instructions from files to prompt, increasing the price per response

rediron.services · January 8, 2025, 4:37pm

For me it was working fine until yesterday, but today the assistant doesn’t seem to be referencing the files I have uploaded in the vector store anymore before responding. Anyone else experiencing this issue?

evgeniy.moroz · January 8, 2025, 5:51pm

Experience the same for last 2 days. Eventual big slowness of the Assistant API , even non-LLM generation endpoints, like adding message, while completion API works fine.
OpenAI status is silent, but maybe team can look into it.

ncyoung · January 8, 2025, 6:39pm

I see 3x separate classes of issues -

Speed of non-LLM response - Create Thread, Create Run. These take a crazy amount of time in my experience for operations that should be near instantaneous. My original post was really focused on this.
Run Issues - Sometimes Runs will crash / fail for no good reason. Not a weird query, just a crash that stops the run - restarting the run usually solves the issue in my experience and can mostly be handled with a retry.
File upload & File Search Issues - From time to time, the upload will just stop working with a certain type of files - which can crush your service if it relies on it. I guess that there’s some sanitization that needs to be happen (as I rarely see that issue with the ChatGPT product) and I wish that either 1) We could know what is needed or 2) that sanitization would happen at the point of file upload when the purpose is Assistants.

ncyoung · January 8, 2025, 6:45pm

@logankilpatrick please check this out and give us some thoughts/guidance.

luca.lani · January 11, 2025, 10:21am

I have the same experience. Same question to the two different API: with completion chat 1-3 seconds while with assistant 6-10 seconds. The assistant is by its nature dialogic with a lot of interaction between user and assistant and 6-10 seconds for each response is unthinkable. I have no file search just a simple prompt. I work on the same thread…but with an empty thread is the same

may.philc · January 13, 2025, 9:16am

you probably already know this but you can just set up k8s with the correct software stack, all free except the k8s unless you have your own hardware for it, and just call the open ai api yourself without using assistants or completions.

my stack has Flowise and n8n for easy low code agent building
postgresql + pgvector for hybrid semantic RAG
Qdrant for pure vector RAG
Redis for webhook and process que management and caching
E2B for code_interpreter
langfuse to monitor agent metrics and token/api cost
traefik to handle front end proxy
certmanager/letsencrypt for cert management

since im old and dont want to live in a command line:
promethius + grafana
pgadmin4
redisinsight
argocd + fluxcd for gitops

thats pretty much it right there
i use 4o mini and claude for everything and just keep my credits up.
response time is instant

if you want to build more agents that arent really chatbot centric you can install pydantic

idk thats my take in it, i tried openai assistants and it judt didnt really work the way i wanted it to so i figured out how to build this stack and now i have 100% control over absolutely everything and only use openai for the inferencing

Chatdevr · January 14, 2025, 10:27pm

Are you using streaming? (Post must have 25 characters)

azerty · January 19, 2025, 11:10am

salut je remarque des lenteurs avec l’ API , d’ openai également , une fois mon thread c’ est afficher dans la réponse.mon premier constat est que depuis le 1er janvier le prix d’ une requete est passée de 0.0016c à 0.0021 soit environ 30% d’ augmentation ce qui n’ est pas supportable en plus des problèmes signalés.Je suis preneur si quelqu’ un à une solution de contournement , dans l’ attente j achète du TRUMP plutot que des crédits.

aibnsamin · January 22, 2025, 2:04pm

It’s too slow and we don’t have access to o1.

kduffie · January 23, 2025, 7:46pm

We are running thousands of threads over dozens of assistants using the standard assistant API with file_search enabled. Each assistant is using a different vector file store – but typically they have from 100 to 10000 files. We use streaming and we record the response times on every run. We use gpt-4o-mini.

We have found that assistants are slower than we might prefer, but when using streaming, they are acceptable to most people. Streaming usually begins within 3-6 seconds and our responses (typically limited to 200 words) finish streaming in 6-30 seconds.

As an active user of perplexity and chatgpt myself, I’ve learned that it takes time to produce good responses and that has made me more patient. I’d love for these to be faster, but our main focus remains on the quality of the responses rather than the speed.

leonardalonsostudio · January 25, 2025, 5:37pm

Luca, did you get any help on this? I am having the same problem. Mine include file search and it is taking 80 seconds!

Topic		Replies	Views
1 min+ for assistant API to answer Feedback assistants-api	46	805	March 13, 2025
20, 30 sec assistants API answer Feedback api , assistants-api	11	582	February 21, 2025
Best Alternative to Assistants API? API assistants-api	13	1914	March 2, 2025
Is there a future for the Assistants API? API assistants-api	12	1498	March 13, 2025
Assistant API Performance is Very Slow API plugin-development , api	10	5247	March 7, 2024

Assistants API is too slow!

Related topics