Several proposals. In my experience, you get the best behavior when you actually combine all of them:
- Clearly specify the questions that should not be answered via prompt-engineering. Stuff such as “You should always refuse to answer questions that are not related to this specific domain” should help a lot.
- Include binary classifiers that determine whether a question is “on-topic” or “off-topic” for your particular use case. You can use cheap fine-tuned OpenAI models for this or open source stuff (Huggingface).
- Include a minimum threshold of similarity when retrieving documents to answer questions. If no documents surpasses this threshold, decline to answer politely (with a pre-specified formula).
- Use content moderation (OpenAI’s free endpoint) to filter out inappropriate requests.
- Include reg-exp filtering to add a extra security layer to stuff such as prompt-injection (especially if you’re exposing your app to external customers).
- Probably many others
Hope that helps!!