How to Limit question results to proprietary dataset?

Several proposals. In my experience, you get the best behavior when you actually combine all of them:

  • Clearly specify the questions that should not be answered via prompt-engineering. Stuff such as “You should always refuse to answer questions that are not related to this specific domain” should help a lot.
  • Include binary classifiers that determine whether a question is “on-topic” or “off-topic” for your particular use case. You can use cheap fine-tuned OpenAI models for this or open source stuff (Huggingface).
  • Include a minimum threshold of similarity when retrieving documents to answer questions. If no documents surpasses this threshold, decline to answer politely (with a pre-specified formula).
  • Use content moderation (OpenAI’s free endpoint) to filter out inappropriate requests.
  • Include reg-exp filtering to add a extra security layer to stuff such as prompt-injection (especially if you’re exposing your app to external customers).
  • Probably many others :slight_smile:

Hope that helps!!

7 Likes