I started building out a new project with the assistants api due to built in thread and context management but after testing the MVP, I want more granular control over when I use what model, how context is summarized, max tokens, etc. Is there a planned release for at least some of these features?
If not, what is everyone else using now instead of the assistants api or langchain? Is it basically just roll your own thread and context management and then use the completions endpoint?
Yep - especially with AI code generation, getting these systems up and running is starting to become incredibly easy. We found langchain (and even the openai libs) to be an encumberance. However, it should probably be said that some of us have been doing this for quite a while now, so we might have bliners on regarding how hard or easy things are for beginners.
So itâs quite possible that starting with custom GPTs, graduating to assistants, maybe using a framework, and eventually ârolling your ownâ might be a reasonable path.
Appreciate it â I thought assistants would be sufficient but Iâm going to move over to rolling my own thread and context management setup. Outside of langchain is there anything else I should consider or just crank out my own setup over a couple days?
Hmm - probably depends on your skill as a software developer.
We found Langchain to be too restrictive, but some people seem to be getting decent results with no-code tools. Perhaps a mix could be a good option depending on what youâre trying to do.
Hmm hmm hmm
I have no solid advice here other than âstay modular and donât skip your automated testsâ
Thatâs just an opinion. I would say it feels very similar to a lot of these no-code tools.
We (company) gave it a shot but found that if you want to extend something and try new things, it becomes pretty cumbersome due to a lack of easily accessible documentation, and high level of abstraction. It felt like building the necessary adapters would take more effort than building the functionality LangChain offers from scratch.
Of course if youâre a diehard LangChain fan and super comfortable with the framework and it works for you, thatâs great! Maybe you can teach us the way
As someone whoâs been developing my own AI applications without Langchain or Python, I 2nd the motions above. Donât get me wrong. I highly recommend Assistants API and/or Langchain when they are the right tools for the job.
If they arenât, and you have (or have access someone who has) software developer skills, then building your own is a viable option. All of these systems, no matter what language, no matter what libraries, are all essentially doing the same API calls to the models â chat completion, assistants and embeddings. EVERYBODY going through the API is using the same basic references which are documented here: https://platform.openai.com/docs/api-reference
I made the commitment from the start to build myself. As it turns out, the coding was the easiest part. The hardest part was understanding how everything works.
Iâm working on configurable assistants that use standard Chat Completion APIs in the background. Iâm interested in what your expectations would be regarding configurability. The prompt and the way the conversation is summarized are clear to me. What Iâm more curious about is the configurability of RAG - what aspects should be adjustable. Also, regarding the code interpreter - whether it should be possible to set which libraries are installed, and if it should support languages like Python and JavaScript. What are your expectations?
I think it will be very interesting to see how thinking and implementations about this will develop.
A lot of questions on the forum are about âflow controlâ and âlogicâ (including function calling) - and I find my self wondering about design decisions around it every day. What part of the âlogicâ do I build - vs âpromptâ. And when I try to âpromptâ and it doesnât work the way I expect (always) - do I fall back to âbuildâ.
Right now I feel my role becoming more of an orchestrator - trying to create reliable connections - and explain the ways of interacting between systems.
I think coders like âcontrolâ and code can be buggy (but always seems 'fixable) wheres GPTâs can seem âfuzzyâ - but often it really is that weâre actually not very good at writing clear instructions. (Which is funny to some extent - in actual code we have to be super precise and detailed - in prompts we often seem to think that âa few wordsâ should be enough.
Interesting times!
p.s. I love the Assistants and run 12 of them across Email, chat bot and internal applications, connecting with Salesforce, Pitchbook and a few others.
Problem is the cost. A managed vector db will cost you minimum of 70-100 usd per month to start with, Then the cost of deploying LLM or the token cost if you use Open AI or any such platform. Then another issue will be latency for all this service.
If you go with PineCone. If you go with Weaviate, that cost drops to $25 a month for their SaaS. If you download their open source package and install locally, that cost drops to 0 per month.
Hate to break this to you, Pal, but GPTs are⌠coded as well. Just because you donât have to code to use them doesnât mean they arenât computer programs like everything else. In fact, every no-code, chat with your pdf, LangChain, whatchamacallit out there is, in the end, developed with CODE.
Which means, they are as susceptible to âbuggyâ as anything else.
All that cool no-code stuff youâre using? Somebody wrote the code so you didnât need to.
Iâm fully aware!
I am talking about the dilemma that âweâ collectively are faced with with these GPTS. What to âcodeâ and what to âpromptâ. And yes, under everything is code.
And writing a prompt is writing code - in a language that we are supposed to be fluent in, but the interpeter running our prompt code, unlike with âregular codeâ will return slightly (or sometimes -wildly-) different results for the same prompt and the same input.
What about the cost of setting up a low latency infrastructure for chat to work properly. I donât know how people do, but when you go for production DB, it should be setup with redundant infrastructure and the cost involved will shoot up exponentially. Unless you have people who can setup such infrastructure and maintain it, always go with SaaS option. Now coming back to using pine cone, still you have to pay for LLM tokens additionally and need to input the whole conversation for building the context. In my opinion it will be better to optimize assistant API by splitting the threads to 4 conversation/thread and append last 2 or 3 conversation for the context.
In my case, using OpenSource software for infrastructure (Drupal CMS), and on AWS EC2 instance that is duplicated daily. Our costs are low, but you do need people who can maintain it. Thatâs for sure.
This is true in any case.
Definitely not true â not at least with the chat completion API. I have used the Standalone Question method successfully for nearly a year now. Even made a video about it: https://youtu.be/B5B4fF95J9s
In your case, this is true. Assistants API was made for you. My point is, it is not always the best tool for everyone. I have nearly 200,000 objects in my vector store. Not something I want to try and fit inito 20 files.