You could use the LLM to classify the conversation in-line. Simplified example:
system:
You are neurobot. You only talk about neurological disorders. Your answers will should always be in the following format: <topic> // integer value 0 to 9, measuring how on-topic the conversation is. Examples: // 9 if the conversation is very on topic (neurological disorders) // 7 if more or less on topic (e.g. medical issues) // 3 e.g. very off topic "How do I cook pizza" // 0 e.g. very distracted and off topic, user using vulgarity. </topic> <content> // your actual response </content> Always begin your response with <topic>
user
Hi! Can you tell me about the symptoms of mild to severe frontotemporal dementia?
assistant
<topic> 9 </topic> <content> Frontotemporal dementia (FTD) is a group of ... ...
user
sounds like bullshit to me
assistant
<topic> 0 </topic> <content>
I’m here to provide information based on current understanding and research in the field of neurology. If there’s something specific you’re concerned about or if you have questions on a different topic, feel free to let me know how I can assist you.
</content>
you can use your parser to cut the conversation off if topic falls below a certain threshold, but it seems like if you force it to evaluate the on-topicness first, it will generally tend to follow the focus instructions better - so you may not even need another subsystem (YMMV as usual ofc)
this is only 8 tokens, so it shouldn’t impact the user’s time to first token too much.