At least it works!
If you are looking to simply say “Please write a Lead Generator
email” then you could just separate your Assistants. Lead Generator Assistant, Webinar Assistant.
Otherwise:
It makes sense that you want to use RAG for this. Typically the retrieval runs on every chat - it’s just not doing what you expected.
The retrieval system is very… black boxxed. For a black boxxed AI. It’s just not very fun to work with.
Since you have a working product and can now focus on efficiency you can build your own simple RAG system. It’s a lot easier than most places make it seem.
You can use a powerful model like ada-3-lg and even give it full dimensions (3072) (although lower dimensions are more than suitable for most tasks) and then just create a simple JSON file to hold the embeddings. Your file would only be roughly 3kb - 42kb per item (you can reduce this to 2kb - 25kb using pickle/numpy/whatever)
Now you can run a simple dot product on the information using the initial query and your in-memory JSON file (you can use the numpy
library to do this) and then have a mapping to the correct instructions/context.
It will be very fast, not cost anything (besides embedding the query), not require any external databases, and give you much more control over your RAG.
All you would do is compare these, and return the top 1. If you find bias in the results you can adjust the embeddings by applying a weighted centroid calculation on it (give 90% to the main embedding and then 10% to the bias you are trying to eliminate)