Building chatbot that needs to respond to user messages that are censored

ziqizhang · February 13, 2025, 10:21am

Thank you so much, your prompt already helps me identify some issues with mine!

You mentioned training models that understand the intent and catch trigger words, I wonder what do you do with the cases they catch? Because I imagine that if you send those content to the LLM backend it will still be problematic.

I’ve been thinking of having local models that do ‘style transfer’, i.e., if we catch content that will violate LLM usage policies then we use these local models to rephrase the content to preserve the meaning but ‘neutralise’ the tone/sentiment.

However, I suppose that still cannot address the situations where the backend LLM content moderation makes wrong predictions or too ‘aggressively’ filters our content, which I know it happens more than we want (e.g. Fine-tuning blocked by moderation system)

Topic		Replies	Views
Clarifying Content Policy on Discussing Personal Experiences Community violations	30	4119	June 29, 2024
Building complex guardrails Prompting gpt-4 , chatgpt , api	1	573	February 2, 2025
How to prevent API prompt from being incorrectly flagged as violating OpenAI's policy? Prompting chatgpt	2	195	June 4, 2025
The Limits to Building Safe GPT-4 Community	13	2424	March 18, 2023
GPT cleaning text assuming it's too sensitive (which is not) GPT builders gpt-4	3	2188	December 12, 2023

Building chatbot that needs to respond to user messages that are censored

Related topics