Hey there everyone!
I’ve been lurking on the forums for a bit, but decided to post a question and get feedback from the community.
I’m working on a Chatbot for my company using GPT models (GPT3.5, GPT4) to perform RAG on our proprietary documents. I’ve explored the ReAct pattern combined with “memory” for conversation tracking, and the use of tools (function calling) . Is there a more optimized method or any recommendations this community can offer based on current best practices?
Additionally, I’m curious about Bing’s RAG implementation. I’ve observed that Bing’s chat can distinguish between follow-up queries and questions that require broader internet searches, along the engine recreating the original question for more searching. I assume there engine is a combination of prompt engineering and multiple LLM calls. Has anyone come across a systematic approach to replicating an efficient retrieval engine?
If there is articles or blogs which has information on what I am discussing, I would greatly appreciate any redirections.