Love the features of the Assistants API, hate the speed of response. It’s painfully slow by comparison to rolling your own, which for me is orders of magnitude faster to first token - but you then lose file_search, integrated attachments and code_interpreter.
OpenAI moderators / representatives - when can we expect to see something that rivals just using chat completions.
Alternatively, when might we see the file service, file_search and code interpreter as tools for completions?
Would love to hear from others if you have a solution that you’ve developed for this problem?
I my experience there are times when it IS indeed much slower. Typically goes away. In general, after using it for over a year now I would say it is very stable/same. Would love to learn a little bit more about your use cases?
My baseline is an OAI Assistant based service, which I’ve extended over time to first support completions, which in turn has been extended to support different models from different providers.
The framework is multi-surface for agent interactions (web, mobile app, iOS app and Text messaging).
Lots of integrations and an api endpoint for further extension.
There are ~35 custom function_tools, so it’s a pretty heavy use case.
I do find the Assistants framework sometimes becomes VERY slow and then recovers, but it is generally much slower - in general when I benchmark completions vs. assistants the completions return tokens <1s where are the Assistants consistently take 3-5 seconds for first token return.
Interesting observation.
I run about 30 different Assistants with 30 different functions, non of them require ‘realtime’ - so really only notice the difference when I am working/testing on them. I assume there is an update in the air for supporting things like o1 that will bring them up to par again. With as many functions as we have I find completions so much harder to manage than Assistants.
I have a good framework (out of necessity) for completions, but the OAI file/RAG service is magical and code interpreter is very cool too, so would love simply for assistants API to just be faster It really shouldn’t be as slow as it is.
This is so true and frustrating at the same time. It seem to have gotten worse recently, when sometime it just stops working and keeps in “run” state for even a min. Got complaints of a few customers about it.
But the next time we try, it is working in the normal state.
The issue is it automatically gets worse and then recovers, but the openai status page also does not have any degraded performance notification nor is there any other communication from them.
Bottomline is using Assistants API is production, is really risky because of these temporal disturbances.
I have an assistant with file search can give me an answer up to 1-10 minutes (it was like that yesterday and today). It works fine without it. This is a problem because you have to flip instructions from files to prompt, increasing the price per response
For me it was working fine until yesterday, but today the assistant doesn’t seem to be referencing the files I have uploaded in the vector store anymore before responding. Anyone else experiencing this issue?
Experience the same for last 2 days. Eventual big slowness of the Assistant API , even non-LLM generation endpoints, like adding message, while completion API works fine.
OpenAI status is silent, but maybe team can look into it.
Speed of non-LLM response - Create Thread, Create Run. These take a crazy amount of time in my experience for operations that should be near instantaneous. My original post was really focused on this.
Run Issues - Sometimes Runs will crash / fail for no good reason. Not a weird query, just a crash that stops the run - restarting the run usually solves the issue in my experience and can mostly be handled with a retry.
File upload & File Search Issues - From time to time, the upload will just stop working with a certain type of files - which can crush your service if it relies on it. I guess that there’s some sanitization that needs to be happen (as I rarely see that issue with the ChatGPT product) and I wish that either 1) We could know what is needed or 2) that sanitization would happen at the point of file upload when the purpose is Assistants.
I have the same experience. Same question to the two different API: with completion chat 1-3 seconds while with assistant 6-10 seconds. The assistant is by its nature dialogic with a lot of interaction between user and assistant and 6-10 seconds for each response is unthinkable. I have no file search just a simple prompt. I work on the same thread…but with an empty thread is the same
you probably already know this but you can just set up k8s with the correct software stack, all free except the k8s unless you have your own hardware for it, and just call the open ai api yourself without using assistants or completions.
my stack has Flowise and n8n for easy low code agent building
postgresql + pgvector for hybrid semantic RAG
Qdrant for pure vector RAG
Redis for webhook and process que management and caching
E2B for code_interpreter
langfuse to monitor agent metrics and token/api cost
traefik to handle front end proxy
certmanager/letsencrypt for cert management
since im old and dont want to live in a command line:
promethius + grafana
pgadmin4
redisinsight
argocd + fluxcd for gitops
thats pretty much it right there
i use 4o mini and claude for everything and just keep my credits up.
response time is instant
if you want to build more agents that arent really chatbot centric you can install pydantic
idk thats my take in it, i tried openai assistants and it judt didnt really work the way i wanted it to so i figured out how to build this stack and now i have 100% control over absolutely everything and only use openai for the inferencing