I started building out a new project with the assistants api due to built in thread and context management but after testing the MVP, I want more granular control over when I use what model, how context is summarized, max tokens, etc. Is there a planned release for at least some of these features?
If not, what is everyone else using now instead of the assistants api or langchain? Is it basically just roll your own thread and context management and then use the completions endpoint?
Yep - especially with AI code generation, getting these systems up and running is starting to become incredibly easy. We found langchain (and even the openai libs) to be an encumberance. However, it should probably be said that some of us have been doing this for quite a while now, so we might have bliners on regarding how hard or easy things are for beginners.
So it’s quite possible that starting with custom GPTs, graduating to assistants, maybe using a framework, and eventually “rolling your own” might be a reasonable path.
Appreciate it – I thought assistants would be sufficient but I’m going to move over to rolling my own thread and context management setup. Outside of langchain is there anything else I should consider or just crank out my own setup over a couple days?
That’s just an opinion. I would say it feels very similar to a lot of these no-code tools.
We (company) gave it a shot but found that if you want to extend something and try new things, it becomes pretty cumbersome due to a lack of easily accessible documentation, and high level of abstraction. It felt like building the necessary adapters would take more effort than building the functionality LangChain offers from scratch.
Of course if you’re a diehard LangChain fan and super comfortable with the framework and it works for you, that’s great! Maybe you can teach us the way
As someone who’s been developing my own AI applications without Langchain or Python, I 2nd the motions above. Don’t get me wrong. I highly recommend Assistants API and/or Langchain when they are the right tools for the job.
If they aren’t, and you have (or have access someone who has) software developer skills, then building your own is a viable option. All of these systems, no matter what language, no matter what libraries, are all essentially doing the same API calls to the models – chat completion, assistants and embeddings. EVERYBODY going through the API is using the same basic references which are documented here: https://platform.openai.com/docs/api-reference
I made the commitment from the start to build myself. As it turns out, the coding was the easiest part. The hardest part was understanding how everything works.
I think it will be very interesting to see how thinking and implementations about this will develop.
A lot of questions on the forum are about ‘flow control’ and ‘logic’ (including function calling) - and I find my self wondering about design decisions around it every day. What part of the ‘logic’ do I build - vs ‘prompt’. And when I try to ‘prompt’ and it doesn’t work the way I expect (always) - do I fall back to ‘build’.
Right now I feel my role becoming more of an orchestrator - trying to create reliable connections - and explain the ways of interacting between systems.
I think coders like ‘control’ and code can be buggy (but always seems 'fixable) wheres GPT’s can seem ‘fuzzy’ - but often it really is that we’re actually not very good at writing clear instructions. (Which is funny to some extent - in actual code we have to be super precise and detailed - in prompts we often seem to think that ‘a few words’ should be enough.
p.s. I love the Assistants and run 12 of them across Email, chat bot and internal applications, connecting with Salesforce, Pitchbook and a few others.
Problem is the cost. A managed vector db will cost you minimum of 70-100 usd per month to start with, Then the cost of deploying LLM or the token cost if you use Open AI or any such platform. Then another issue will be latency for all this service.
Hate to break this to you, Pal, but GPTs are… coded as well. Just because you don’t have to code to use them doesn’t mean they aren’t computer programs like everything else. In fact, every no-code, chat with your pdf, LangChain, whatchamacallit out there is, in the end, developed with CODE.
Which means, they are as susceptible to “buggy” as anything else.
All that cool no-code stuff you’re using? Somebody wrote the code so you didn’t need to.
I’m fully aware!
I am talking about the dilemma that ‘we’ collectively are faced with with these GPTS. What to ‘code’ and what to ‘prompt’. And yes, under everything is code.
And writing a prompt is writing code - in a language that we are supposed to be fluent in, but the interpeter running our prompt code, unlike with ‘regular code’ will return slightly (or sometimes -wildly-) different results for the same prompt and the same input.
What about the cost of setting up a low latency infrastructure for chat to work properly. I don’t know how people do, but when you go for production DB, it should be setup with redundant infrastructure and the cost involved will shoot up exponentially. Unless you have people who can setup such infrastructure and maintain it, always go with SaaS option. Now coming back to using pine cone, still you have to pay for LLM tokens additionally and need to input the whole conversation for building the context. In my opinion it will be better to optimize assistant API by splitting the threads to 4 conversation/thread and append last 2 or 3 conversation for the context.
In my case, using OpenSource software for infrastructure (Drupal CMS), and on AWS EC2 instance that is duplicated daily. Our costs are low, but you do need people who can maintain it. That’s for sure.
This is true in any case.
Definitely not true – not at least with the chat completion API. I have used the Standalone Question method successfully for nearly a year now. Even made a video about it: https://youtu.be/B5B4fF95J9s
In your case, this is true. Assistants API was made for you. My point is, it is not always the best tool for everyone. I have nearly 200,000 objects in my vector store. Not something I want to try and fit inito 20 files.