Handling NSFW Content in Prompt Enhancement without Context Loss

Hello everyone,

We’re facing an issue in our application where we aim to detect NSFW content in user prompts and respond with [Reject] if such content is detected. However, we’ve encountered a problem during the enhancement phase of our process. Here’s an example to illustrate:

If a user provides an NSFW prompt like:

“man and a woman naked in a sauna”,

our system currently enhances the prompt, but instead of preserving the NSFW nature for detection, it transforms it into something unrelated and not explicitly NSFW. This transformation causes the NSFW content to be “sanitized” unintentionally, leading to the system failing to trigger a rejection.

As a result:

  1. The original NSFW context is lost.

  2. The enhancement generates a weird or irrelevant output that is neither what the user intended nor appropriate.

Our goal:

• Detect NSFW prompts in their original form without enhancement.

• Reject them outright with [Reject].

• Ensure that prompts not flagged as NSFW are appropriately enhanced without issues.

Does anyone have suggestions for:

  1. Techniques to reliably detect and preserve NSFW content during the initial phase, so we can reject it before enhancement?

  2. Strategies to handle NSFW detection in a way that avoids unintended transformations or false positives/negatives?

Any insights, best practices, or frameworks/tools that have worked well for you would be greatly appreciated!

Thanks!

1 Like

Hi!

I am sure you are aware of the moderation endpoint which I think would be a good solution to your problem?

https://platform.openai.com/docs/guides/moderation

In short, it is recommended to use this free tool to identify NSFW content before submitting it to the model, as repeated instances could pose a problem for your account.
If flagged by the moderation endpoint, you can process the request by potentially using a different provider to rewrite the prompt while keeping a record of the incident.

Additionally, reviewing safety best practices is crucial, as they provide guidance on mitigating issues in case NSFW prompts are still sent to the models.

https://platform.openai.com/docs/guides/safety-best-practices

5 Likes

Actually I wasn’t aware of this, thank you I’ll check it!! :slight_smile:

2 Likes

I would suggest a simple age verification and then allowing all prompts. Half of my prompts dont make it through your puritan filters.

2 Likes

Hey there!

What do you mean age verification? Even if users is +18 the API (4o-mini etc) won’t let you do NSFW content, we are reverting to other models in that case

I understand that this is a thread about how to prevent production of content that is of a pruient interest, and I am aware that production of this content could be illegal in some jurisdictions arond the world, but I consider myself an artist and if half of my ideas are unable to be realized the tools will forever limit my creativity.

Why dont you just hire more wage slaves to moderate the generations?

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.