Hi @ikonodim,
At 2000 tokens, the knowledge base isn’t particularly large, but I understand that this may result in some costs that increase with every successive turn.
I wrote a tutorial about how to use embeddings to retrieve relevant context for an AI assistant, which you may find helpful
Additionally, here’s the OpenAI Cookbook with a bunch of tutorials. At the bottom, you’ll find RAG implementations with some popular vector DBs.
I’d also recommend that you segregate the instructions and the context (knowledge) to ensure the model behavior remains appropriate even if the context isn’t found.