Assistants API is Killing Me

anon79482835 · February 11, 2024, 10:46am

Definitely more expensive.
In terms of flexibility, there’s a trade-off for sure. It’s simplifies your solution somewhat, so the core functionality is handled by OpenAI. But, rigid with anything to do with retrieval, etc.

sbm · February 11, 2024, 10:53am

wow I didn’t even think of the cost, I thought it would only be 2x more expensive than the completions

what are you using instead???

sbm · February 11, 2024, 10:54am

so you recommend using 3.5 over 4?

my use case is:
Medical Sales doctor ( it diagnoses users “symptoms” then sells a product from the knowledge base )

DevGirl · February 11, 2024, 10:59am

Just to clarify – there is barely any effort in infrastructure or coding by offloading the embedding to Pinecone.

You’re simply trading one store for another. Because Pinecone is SaaS you have zero infrastructure effort and it doesn’t require code.

If that route isn’t ideal, the GPT 3.5 shouldn’t be an issue being that it’s primarily powered by the RAG, moreso than zero-shot/training.

In both cases, I think you’re just finding a way to reduce compute which reduces cost. And the only real investment is the learning if you’ve never architected in this manner. However, that’s only about an hour of effort.

If you’d like help implementing either, I’m happy to share whatever I can.

anon79482835 · February 11, 2024, 11:29am

I’m stuck using GPT-4. It does way better than 3.5 with conversational agents that need to be precise. In your case, sounds like the precision is very much needed. If Assistants is too expensive for you, then going with one of the above suggestions (e.g., building your own RAG system) will be the cheaper option (operationally).

anon79482835 · February 11, 2024, 11:30am

That’s awesome of you, thank you! I’ve not played around with PineCone yet, but I’m reading the docs right now and I’ll see how that goes. I may ping you privately if I get stuck

DevGirl · February 11, 2024, 11:42am

Even if you don’t use it for this project; it’s extremely valuable if doing any development with LLM’s/RAG and will be worth your time, I promise you.

One of our users: @mehulgupta2016154 recently wrote a book on LangChain.

They also shared their Youtube channel, which has some very quick/simple vector DB tutorials in Python.

This particular video utilizes ChromaDB (you can run locally), but it’s a nice conceptual view:

I promise I’m not going off-topic – I know we’re not talking about LangChain but it provides a library that streamlines portions of the process.

anon79482835 · February 11, 2024, 11:58am

Thanks @DevGirl, much appreciated

mehulgupta2016154 · February 11, 2024, 2:35pm

Thanks @DevGirl for the mention.

sbm · February 11, 2024, 7:59pm

I build my own rag system in 10 minutes lol, I am no longer using the assistants API

anon79482835 · February 11, 2024, 8:16pm

Building a RAG system isn’t the challenge here. It’s all the prep that goes into the data and all the subsequent updates. Managing context, including metadata, testing, etc. With Assistants API, you just upload a file and it does the rest… and it seems to handle the documents quite well.

What’s your approach to the data?

anon79482835 · February 11, 2024, 8:17pm

But for more complex conversations GPT-4 performs better IMO

stevenic · February 11, 2024, 9:19pm

For those of you wanting to try out building your own RAG system I’ll throw out another option which is Vectra, the local Vector DB I created. It has full parity with Pinecone but it’s free. It also has a fairly robust document import mechanism. A number of projects are using it so it’s starting to get fairly polished:

JS Version: GitHub - Stevenic/vectra: Vectra is a local vector database for Node.js with features similar to pinecone but built using local files.
Python Version: GitHub - BMS-geodev/vectra-py: WIP port of the vectra js in memory vector database.

I’d still recommend something like Pinecone for large scale production scenarios but Vectra is a great way to give RAG a try without having to spend money or create any sort of account

icdev2dev · February 11, 2024, 9:26pm

probably off topic on this thread, but thought I would ask anyways… What are the differences between Vectra and LanceDB?

stevenic · February 11, 2024, 9:35pm

I haven’t looked at LanceDB. Honestly I built Vectra because I’m a JS/TS developer and when I started tinkering with RAG there weren’t any JS based solutions. A friend (and now one of my employees) ported it to Python for me. In fact when I started exploring LLMs there weren’t any JS tools period so I ended up having to build my own entire stack. I’m an SDK Architect though so realistically I would have built my own stack anyway.

These days I’d would humbly say that I’ve become an expert in all things RAG and I have a whole slew of improvements coming to Vectra 2. I looked at LlamaIndex for instance and while it’s awesome I don’t like the file format. It’s too large. My goal with Vectra 2, and another library I’m building called DSTRUCTure, is too build the state of the art suite of tools for all things RAG.

stevenic · February 11, 2024, 9:54pm

Ok so I did a quick scan of LanceDB and it looks cool. RUST core with numerous language bindings. But here’s my issue will all of these Vector DB’s. They only ever tackle one part of the RAG pipeline… the middle part, the DB. They leave it up to the developer to build the other parts of the pipeline Ingest & Render which are arguably the more difficult parts of the pipeline. You can use LangChain to help with the Ingest side but there’s lots of innovation left to be done on the Ingest side of things and nobody is really trying to innovate on the Render side of things. The work I did on Document Sections in Vectra is still the most significant effort I’ve seen to innovate on the Render side and there’s lots of room for improvement there.

I build tools for developers and my goal is always to create as turnkey a solution as I can for developers. The Teams AI Library I designed for example virtually eliminates the need for prompt engineering. I applaud innovation wherever I see it I always just prefer wholistic turn key solutions personally.

dominique3 · February 12, 2024, 1:48pm

Maybe “LlamaIndex” is useful, it is open source, free (MIT License) and can build a RAG system with a few lines of code.

import os

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
index = VectorStoreIndex.from_documents(documents)

To query:

query_engine = index.as_query_engine()
query_engine.query("YOUR_QUESTION")

I haven’t used it myself yet but it seems to be quite popular.

sbm · February 13, 2024, 6:53am

I just use Flowise / Langflow, it makes the whole process super easy, just connect nodes and it also does the rest, not gonna lie, the speed ended up being exactly the same as the assistants API, and the costs are similar, so I might go back to the Assistants

The api of the assistants isn’t slow, it’s the GPT4 being slow that makes the Ai take so long

Cristian74 · February 13, 2024, 6:25pm

Can these RAG solutions dynamically update the data they hold too? So if a user gave it relevant info in a prompt, it could store it for later use. Edit: just asked ChatGPT, yes they can. Might try it out but I will have use a Flask webserver or websocket as the platform my app is on won’t run Python code (I use a C# wrapper class for using the OpenAI API).

Topic		Replies	Views
Alternatives to Assistant API API assistants-api	19	7557	January 26, 2024
Did assistant api kill manual RAG with vector databases? API	8	7055	December 18, 2023
Assistants API pricing details per message API api-billing	67	42580	November 30, 2023
Is there a future for the Assistants API? API assistants-api	13	4075	June 1, 2025
Assistants API is too slow! API assistants-api	26	6972	March 16, 2025

Assistants API is Killing Me

Related topics