Best Alternative to Assistants API?

I’m increasingly concerned about the future (or lack thereof) for the assistants API. However, I’ve been building a platform based on it. At the center of the platform is a variety of uniquely tailored (personality and knowledge as defined in their instructions) assistants.

If you had to recommend and alternative to Assistants API what would it be? I fear I may need to begin working on a substitute as we near soft launch readiness.

2 Likes

Did you look at Pinecone Assistant API?

2 Likes

Because of the performance irregularities we discovered (seemingly random +15sec delays) , and because OpenAI has never acknowledged our reports, we also migrated away from the Assistant API, and have transitioned to our own multiplex Conversation package. It has the following advantages over the Assistant API:

  1. Typically > 40% reduction in response time vs the same prompt/model/tools in the Assistant API.
  2. More control over context management (you can control the number of messages, older message context is managed in a rolling summary)
  3. More options for message storage (local, database), plugin-ready.
  4. More intuitive interface that works with multiple providers (currently OpenAI and Anthropic, more coming after we finish our own product).

If your team is interested in licensing the package, send me a DM and I’ll schedule a demo.

4 Likes

Better way - implement it yourself with completions api. we did that way.

4 Likes

We moved to llamaindex. Much more control. Less blackbox.

1 Like

Thanks, @javidd . I’m leaning that way. We’ve already belt a platform that does much of the wrapper functionality we need. Have you seen a good summary of the gaps between Completions and Assistants API?

I appreciate your insights.

1 Like

@blichtenwalner main gaps and problems I have seen:

  1. With assistants - you can’t manage context usage (for memory). After 4-5 messages you will see that it uses between 9000-12000 tokens on each answer
  2. Api works slowly if you want to save messages in your own database. And, in the end you have save them in your db. Otherwise you have to do api call to get threads (which are chats), messages etc. it takes time. Users don’t like waiting.
  3. May be you want to use other models in the future, for example claude, grok, gemini etc. Then you will not be able to use them with openai assistants
  4. Good part - you can send files to assistant and read them.

With completions:

  1. You can manage context usage
  2. It will not be slow,
  3. You will be able to use other AI models. You will just need to connect their API
  4. For file reading - you will need to implement it yourself. For example in our project we can read pdf, doc, txt, excel, csv, pptx documents. We have even added functionality sending audio messages to gpt.

So by using completions api you have more “freedom”. It just takes a little more time rather than using assistants API.

2 Likes

Thanks for the excellent summary. I feel foolish but what’s the benefit to Assistants then? I went down that path and now I’m regretting it. With completions you give the instructions to which can differentiate your “personalities” and knowledge right?

1 Like

Of course, you can give instructions as system prompt. Using Assistants you may not need to implement saving chats, messages in your database. because implementing that functionality to save them, to save separate chats etc. takes some time. some message types are text, some image etc.

1 Like

Just from speaking from my experience – I had built my own setup prior to Assistants API – and there really is no comparison. Not because my previous implementation was somehow hampered, but because the Assistants API is that good.

You’ll regret going with something else when next month they release the next version of Assistants that fixes everything and probably has some insanely good RAG/CMS thing attached to it.

Yes, there is a “black box” that eventually gets the token context up to 90k-128k tokens but the end result is so much better. My users are extremely happy with the “memory” aspect of my app, often comparing it to Claude or Chat, and saying that the others don’t come close to “remembering” their story–which just means that its insanely good at drawing up context from the entire conversation.

Storing the messages in the DB is inconsequential if you make the call after the response has streamed. It takes a split second and you can do it while they’re reading the previous response, or starting to type the next and then just store that.

Just know that no matter which direction you go, everything will change in less than two months and then you’ll have to start all over again. :smile: