I have developed a chatbot using Langchain’s OpenAI LLM (text-davinci) model and added my own contextual data using the llama index (GPT index) on top of GPT’s existing data.
I’m facing an issue with a specific scenario in my chatbot. I have included the following FAQ in my training data, which consists of a large list of questions:
Q: Who is the Prime Minister of India?
A: The Prime Minister of India is John Doe.
However, when I ask the bot this question, I want it to consistently provide this specific answer. While it does give the desired answer sometimes, most of the time it retrieves the answer from the internet or its own corpus, stating that the Prime Minister of India is Narendra Modi.
Essentially, I want complete control over the response generated by GPT when I ask questions from my training dataset. However, I also want GPT to utilize its own corpus to answer questions that are not part of my training dataset. For instance, if I ask a question like “Tell me something about European culture,” which is not in my training dataset, GPT should provide a response based on its own knowledge. But when I enquire about the “PM of India,” it should always respond with “John Doe.”
It is important to note that this is not a typical fine-tuning scenario, as we are not looking to identify patterns in the questions. Fine-tuning fails when we ask questions like “Who is the wife of the PM?” since it provides the same answer as “Who is the PM?”
I would greatly appreciate any suggestions or assistance regarding this matter.