I am in the process of building a chat bot that specialises in facilitating chats between two people in a workplace context. For example, imagine that two employees are working on a company project and they may chat about needs various information, some of which reside in the company’s knowledge base (e.g., staff handbook, process of setting up an Azure account for an employee, where to find certain datasets etc). The goal of the chat bot is to
- monitor the conversation between the two human users
- understand when the users need certain information, performs search in the KG and provide suggestions (e.g., it looks like you are looking for X, here is what I found…)
- moderate the chat to prevent unwanted discussion (e.g., breach of company rules like sharing customer information, ‘jailbreaking’ etc)
My question is more on item 3, which sounds like a guardrail. While I know you can build guardrail with the open ai API (How to implement LLM guardrails | OpenAI Cookbook) and third party apis like GuardrailAI, the case I am dealing with seems more complex: it is not catching one single message and stop it, but requires constant monitoring the chat, analysing it, and step in when needed.
I wonder if anyone worked on something similar before and can share your experience? I was thinking along the setup like
- a main bot responsible for chatting with the users
- another ‘agent’ that acts as a ‘moderator’ and when it detected a case it should step in, informs the chat bot and inject additional prompts dynamically, e.g., ‘your user said X, they should not do Y. Reply accordingly’
Does this work?
Many thanks!