Force the Assistant to read the knowledge base files before generating the reply

anon10827405 · June 5, 2024, 5:04pm

At least it works!

If you are looking to simply say “Please write a Lead Generator email” then you could just separate your Assistants. Lead Generator Assistant, Webinar Assistant.

Otherwise:

It makes sense that you want to use RAG for this. Typically the retrieval runs on every chat - it’s just not doing what you expected.

The retrieval system is very… black boxxed. For a black boxxed AI. It’s just not very fun to work with.

Since you have a working product and can now focus on efficiency you can build your own simple RAG system. It’s a lot easier than most places make it seem.

You can use a powerful model like ada-3-lg and even give it full dimensions (3072) (although lower dimensions are more than suitable for most tasks) and then just create a simple JSON file to hold the embeddings. Your file would only be roughly 3kb - 42kb per item (you can reduce this to 2kb - 25kb using pickle/numpy/whatever)

Now you can run a simple dot product on the information using the initial query and your in-memory JSON file (you can use the numpy library to do this) and then have a mapping to the correct instructions/context.

It will be very fast, not cost anything (besides embedding the query), not require any external databases, and give you much more control over your RAG.

All you would do is compare these, and return the top 1. If you find bias in the results you can adjust the embeddings by applying a weighted centroid calculation on it (give 90% to the main embedding and then 10% to the bias you are trying to eliminate)

Olyray · June 5, 2024, 5:14pm

Oh. Wow. I’ve heard about this but am unsure how to go about it, or where to even start from.

Do I have to use Pinecone, or it’s something I can do locally?

Can you point me to a guide on how to do this?

chaithzx · June 5, 2024, 5:37pm

Thanks for sharing your knowledge brother. I have sent a DM to you regarding an opportunity. Please take a look at it and let me know where you stand.

Olyray · June 20, 2024, 8:52am

Well… I think I have found a solution, using Pinecone to create a RAG architecture.

A query that used 17000 tokens on assistants API used just 1400 tokens after I created my RAG architecture.

I used this Jupyter notebook from Pinecone as a template Google Colab

occamalem82483 · October 23, 2024, 3:04pm

Hello, has your problem been resolved? We also have encountered a very similar issue.

Topic		Replies	Views
How do you maintain historical context in repeat API calls? API	29	93288	December 23, 2023
CLOSED Separate ChatCompletion API calls for 'system' and 'user' API	19	3643	September 20, 2023
ChatGPT - Do we get exactly 40 messages per 3 hours? GPT builders feedback	27	31739	January 5, 2024
Build your own AI assistant in 10 lines of code - Python Documentation gpt-4 , gpt-35-turbo , chat-completion , python , tutorial	13	66736	December 12, 2023
Stateles & sending previous replies, to create a thread API gpt-4	7	3437	December 19, 2023

Force the Assistant to read the knowledge base files before generating the reply

Related topics