General knowledge AND custom trained knowledge

I’m using Python together with LangChain and OpenAI’s API to train a chatbot using my custom data. That seems to work well regarding the custom data, meaning that for every question regarding information I feed into the model with LangChain, I get decent answers. However the chatbot seems to use this as an only knowledge base, so I can’t ask it anything else. Is this normal?

I was expecting to have a similar broad general knowledge as when using ChatGPT, but on top of that the knowledge from my custom data. Now I get only the knowledge from my custom data. Is there a way to achieve having a chatbot with the general knowledge AND the custom trained knowledge?

I have tried to use different training data (digitally created PDFs, scanned PDFs, CSVs, Plain-Text), different models and different model parameters, but I don’t manage to improve my outcome.

1 Like

Due to your use of LangChain obfuscating what you’re actually doing with the underlying models/APIs(for one, it sounds more like you’re using embedding retrieval than fine-tuning), it would probably be best to share the specific modules you’re using with the LangChain community, so they can help you better understand what you need or what’s going on under the hood, like which API endpoints are being used, what system prompt is preventing answers outside your knowledge based, etc.


Thanks for the reply @wfhbrian . Based on your comment I assume that usually it should work with both knowledge sources, correct?

I can’t tell you what LangChain is doing under the hood, so I don’t know what you implemented.

However, if you used the API directly, you would use your knowledgbase to retrieve information, then insert what’s retrieved as context in the system prompt when generating an output. At the time of the generation, unless you explicitly instructed the model to do otherwise within the system prompt, the model will respond based on both the context retrieved from the knowledgbase and the base knowledge of the model itself.