Need assistance in designing a prompt to classify if the question is related to the provided context or not

We have a use-case where we need to create a pipeline where the model classifies if a question can be answered from the given context or not. Based on the classification we need to perform different downstream actions for both the routes. A typical prompt+instruction sample is looking like below:

SYSTEM PROMPT: You are xyzgpt, a chatbot designed for some organization. If you find “you”, “your”, “our” keywords in the question consider it is in some organization context.
Knowledge cutoff: 2021-09-01. Current date: current date

Fact: Some fact

To answer questions, let’s think step-by-step. Is this answer available in the fact? If the answer is available in the given fact, state that ‘the answer is available in the fact’ and explain your answer referencing the sentence in the fact.
If the answer is not available in the given fact, state that ‘the answer is not available in the fact’ and generate the answer based on your Knowledge base. Here are some examples.

Question: ‘Question related to the fact.
Answer: 'The answer is available in the fact. Answer
Question: ‘Who are the founders of Google?’
Answer: ‘The answer is not available in the fact. The founders of Google are Larry Page and Sergey Brin. They founded the company in September 1998 while they were Ph.D. students at Stanford University’

We need the phrase ‘The answer is available in the fact’/ ‘The answer is not available in the fact’ to successfully implement downstream pipelines. We are able to achieve 80-85 percent accuracies with this set of instructions with GPT-4. But some times the model does not generate these phrases. Is there a way we can fine-tune this set of instructions to achieve a better consistency on the generated text?

Have you tried giving it samples ? With LLM’s, samples go a very long way. A few examples for successful and non successful cases would help the output generation immensely.

Yes we have tried giving samples 1 for in context question and 1 for out of context question. We are trying with few more samples for each case. However, I wanted to know if there is any ‘Prompt Hack’ that we can use in such a use case in order to make the model generate the desired phrase more consistently.

The problem with a prompt hack is that it can be widely unpredictable. As each session is going to be a new API call, the consistency will not be there by using a mere statement or instruction as per my experience. Unless you are using a fine-tuned model or using multiple samples, the output will remain inconsistent and prone to hallucination.

This is the typical scenario where implementing a binary classifier could help a lot. You can create it with just a couple hundred samples and it should work pretty well, over 90% acc. The task is simple. As a hack: instead of predicting the sequence “The answer is available in the fact” or the other one, just predict a single token (whether 0 or 1). You can map this token to your sequence afterwards and the classifier would have a better behavior.

Hope it helps!!