Handling NSFW Content in Prompt Enhancement without Context Loss

admin215 · January 9, 2025, 9:05am

Hello everyone,

We’re facing an issue in our application where we aim to detect NSFW content in user prompts and respond with [Reject] if such content is detected. However, we’ve encountered a problem during the enhancement phase of our process. Here’s an example to illustrate:

If a user provides an NSFW prompt like:

“man and a woman naked in a sauna”,

our system currently enhances the prompt, but instead of preserving the NSFW nature for detection, it transforms it into something unrelated and not explicitly NSFW. This transformation causes the NSFW content to be “sanitized” unintentionally, leading to the system failing to trigger a rejection.

As a result:

The original NSFW context is lost.
The enhancement generates a weird or irrelevant output that is neither what the user intended nor appropriate.

Our goal:

• Detect NSFW prompts in their original form without enhancement.

• Reject them outright with [Reject].

• Ensure that prompts not flagged as NSFW are appropriately enhanced without issues.

Does anyone have suggestions for:

Techniques to reliably detect and preserve NSFW content during the initial phase, so we can reject it before enhancement?
Strategies to handle NSFW detection in a way that avoids unintended transformations or false positives/negatives?

Any insights, best practices, or frameworks/tools that have worked well for you would be greatly appreciated!

Thanks!

vb · January 9, 2025, 10:01am

Hi!

I am sure you are aware of the moderation endpoint which I think would be a good solution to your problem?

https://platform.openai.com/docs/guides/moderation

In short, it is recommended to use this free tool to identify NSFW content before submitting it to the model, as repeated instances could pose a problem for your account.
If flagged by the moderation endpoint, you can process the request by potentially using a different provider to rewrite the prompt while keeping a record of the incident.

Additionally, reviewing safety best practices is crucial, as they provide guidance on mitigating issues in case NSFW prompts are still sent to the models.

https://platform.openai.com/docs/guides/safety-best-practices

admin215 · January 10, 2025, 9:48am

Actually I wasn’t aware of this, thank you I’ll check it!!

operativewords · January 10, 2025, 7:29pm

I would suggest a simple age verification and then allowing all prompts. Half of my prompts dont make it through your puritan filters.

admin215 · January 11, 2025, 11:08am

Hey there!

What do you mean age verification? Even if users is +18 the API (4o-mini etc) won’t let you do NSFW content, we are reverting to other models in that case

operativewords · January 11, 2025, 4:11pm

I understand that this is a thread about how to prevent production of content that is of a pruient interest, and I am aware that production of this content could be illegal in some jurisdictions arond the world, but I consider myself an artist and if half of my ideas are unable to be realized the tools will forever limit my creativity.

Why dont you just hire more wage slaves to moderate the generations?

vb · January 13, 2025, 4:21pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPT cleaning text assuming it's too sensitive (which is not) GPT builders gpt-4	3	2058	December 12, 2023
Tips for "filtering" content submitted by user message Community	3	2397	April 2, 2023
GPT-4-Vision detect NSFW images API gpt-4-vision	9	22740	November 21, 2023
Gpt-3.5-turbo-0613 refusing generations for NSFW content API chatgpt	11	36463	December 13, 2023
Moderation API does not understand the concept of negative prompts API image-generation , moderation	10	3451	August 25, 2023

Handling NSFW Content in Prompt Enhancement without Context Loss

Related topics