Alternatives to Assistant API

I started building out a new project with the assistants api due to built in thread and context management but after testing the MVP, I want more granular control over when I use what model, how context is summarized, max tokens, etc. Is there a planned release for at least some of these features?

If not, what is everyone else using now instead of the assistants api or langchain? Is it basically just roll your own thread and context management and then use the completions endpoint?


Yep - especially with AI code generation, getting these systems up and running is starting to become incredibly easy. We found langchain (and even the openai libs) to be an encumberance. However, it should probably be said that some of us have been doing this for quite a while now, so we might have bliners on regarding how hard or easy things are for beginners.

So it’s quite possible that starting with custom GPTs, graduating to assistants, maybe using a framework, and eventually “rolling your own” might be a reasonable path. :thinking:

1 Like

Appreciate it – I thought assistants would be sufficient but I’m going to move over to rolling my own thread and context management setup. Outside of langchain is there anything else I should consider or just crank out my own setup over a couple days?

Hmm - probably depends on your skill as a software developer.

We found Langchain to be too restrictive, but some people seem to be getting decent results with no-code tools. Perhaps a mix could be a good option depending on what you’re trying to do.

Hmm hmm hmm :thinking:

I have no solid advice here other than “stay modular and don’t skip your automated tests” :laughing:

1 Like

What do you mean with “Langchain to be too restrictive”, can you give us an example to understand why is it restrictive ?


That’s just an opinion. I would say it feels very similar to a lot of these no-code tools.

We (company) gave it a shot but found that if you want to extend something and try new things, it becomes pretty cumbersome due to a lack of easily accessible documentation, and high level of abstraction. It felt like building the necessary adapters would take more effort than building the functionality LangChain offers from scratch.

Of course if you’re a diehard LangChain fan and super comfortable with the framework and it works for you, that’s great! Maybe you can teach us the way :cowboy_hat_face:

1 Like

100%. It’s powerful, but if and once you wish to improve/optimize things it becomes a massive burden.

1 Like

All that said, a lot of money seems to be pushing LangChain (IBM, AWS)

So it’s quite possible that you can soon earn tons of money as a consultant, fixing (or rather life-supporting) sunken-cost LangChain projects. :thinking:


Huh… That’s a great thought…

Scurries to create an Assistant with LangChain docs. Just need some cardboard, a marker and I’m in business.

1 Like

As someone who’s been developing my own AI applications without Langchain or Python, I 2nd the motions above. Don’t get me wrong. I highly recommend Assistants API and/or Langchain when they are the right tools for the job.

If they aren’t, and you have (or have access someone who has) software developer skills, then building your own is a viable option. All of these systems, no matter what language, no matter what libraries, are all essentially doing the same API calls to the models – chat completion, assistants and embeddings. EVERYBODY going through the API is using the same basic references which are documented here:

I made the commitment from the start to build myself. As it turns out, the coding was the easiest part. The hardest part was understanding how everything works.

This flowchart got me started:

Certainly not everything you need to know, but a good place to start if, like I was, you don’t know anything.

1 Like

I’m working on configurable assistants that use standard Chat Completion APIs in the background. I’m interested in what your expectations would be regarding configurability. The prompt and the way the conversation is summarized are clear to me. What I’m more curious about is the configurability of RAG - what aspects should be adjustable. Also, regarding the code interpreter - whether it should be possible to set which libraries are installed, and if it should support languages like Python and JavaScript. What are your expectations?

One approach is to avoid using any abstractions, except perhaps for the LLM models, such as LangChain. Then, write all the AI logic yourself directly to the APIs, this is much simpler than many realize. This approach gives you 100% control over your own logic flow. Additionally, you can check out our open-source Policy Synth agent class library, where, due to it’s polymorphic nature, you have full control over the level of abstraction you want to use, if any: GitHub - CitizensFoundation/policy-synth: Policy Synth is a Typescript class-based library for creating AI agent logic flows, API's and state of the art realtime web applications. The drive behind the project is to help governments and citizens make better decisions together by seamlessly integrating collective and artificial intelligence.

I think it will be very interesting to see how thinking and implementations about this will develop.
A lot of questions on the forum are about ‘flow control’ and ‘logic’ (including function calling) - and I find my self wondering about design decisions around it every day. What part of the ‘logic’ do I build - vs ‘prompt’. And when I try to ‘prompt’ and it doesn’t work the way I expect (always) - do I fall back to ‘build’.

Right now I feel my role becoming more of an orchestrator - trying to create reliable connections - and explain the ways of interacting between systems.

I think coders like ‘control’ and code can be buggy (but always seems 'fixable) wheres GPT’s can seem ‘fuzzy’ - but often it really is that we’re actually not very good at writing clear instructions. (Which is funny to some extent - in actual code we have to be super precise and detailed - in prompts we often seem to think that ‘a few words’ should be enough.

Interesting times!

p.s. I love the Assistants and run 12 of them across Email, chat bot and internal applications, connecting with Salesforce, Pitchbook and a few others.


Problem is the cost. A managed vector db will cost you minimum of 70-100 usd per month to start with, Then the cost of deploying LLM or the token cost if you use Open AI or any such platform. Then another issue will be latency for all this service.

If you go with PineCone. If you go with Weaviate, that cost drops to $25 a month for their SaaS. If you download their open source package and install locally, that cost drops to 0 per month.

Hate to break this to you, Pal, but GPTs are… coded as well. Just because you don’t have to code to use them doesn’t mean they aren’t computer programs like everything else. In fact, every no-code, chat with your pdf, LangChain, whatchamacallit out there is, in the end, developed with CODE.

Which means, they are as susceptible to “buggy” as anything else.

All that cool no-code stuff you’re using? Somebody wrote the code so you didn’t need to.

I’m fully aware!
I am talking about the dilemma that ‘we’ collectively are faced with with these GPTS. What to ‘code’ and what to ‘prompt’. And yes, under everything is code.
And writing a prompt is writing code - in a language that we are supposed to be fluent in, but the interpeter running our prompt code, unlike with ‘regular code’ will return slightly (or sometimes -wildly-) different results for the same prompt and the same input.

1 Like

What about the cost of setting up a low latency infrastructure for chat to work properly. I don’t know how people do, but when you go for production DB, it should be setup with redundant infrastructure and the cost involved will shoot up exponentially. Unless you have people who can setup such infrastructure and maintain it, always go with SaaS option. Now coming back to using pine cone, still you have to pay for LLM tokens additionally and need to input the whole conversation for building the context. In my opinion it will be better to optimize assistant API by splitting the threads to 4 conversation/thread and append last 2 or 3 conversation for the context.

In my case, using OpenSource software for infrastructure (Drupal CMS), and on AWS EC2 instance that is duplicated daily. Our costs are low, but you do need people who can maintain it. That’s for sure.

This is true in any case.

Definitely not true – not at least with the chat completion API. I have used the Standalone Question method successfully for nearly a year now. Even made a video about it:

In your case, this is true. Assistants API was made for you. My point is, it is not always the best tool for everyone. I have nearly 200,000 objects in my vector store. Not something I want to try and fit inito 20 files.

Yes, it may not suitable for such a complex use cases.