How does the content filter work?

giada · April 26, 2021, 7:42am

Hello,

I would like to know more about the GPT-3 content filter. I already read the OpenAI documentation about it, but I have further questions.

Does GPT-3 have a black list knowledge base with unsafe and sensitive flagged content?
Do you use sentiment analysis to detect the unsafe and sensitive content?
What is the role of users’ feedbacks in the content filter? Do you use both sentiment analysis and feedbacks?
Is there some human intervention at some point or does GPT-3 flag the content on its own?
Do users’ feedbacks and sentiment analysis results match? How often (percentage)?

Thank you in advance!

joey · April 26, 2021, 9:13am

Hello Giada,

The content filter classifies text as safe, sensitive, or unsafe, and is currently built to err on the side of caution. That said, we aim to improve the content filter over time, and if/when more specific answers are available, we’ll add them to the documentation.

Best,
Joey

Topic		Replies	Views
Clarity on sensitive content filters if it could be considered harmful in a different context Community	1	979	May 22, 2021
Tips for "filtering" content submitted by user message Community	3	2763	April 2, 2023
Beyond Few Shot Learning: Fine tuning with GPT-3 Community	2	647	July 16, 2021
Best practices for testing offensive topics Community	6	2288	July 15, 2021
Content Completer requirements library - are the rules same for everyone? Community	7	656	January 3, 2024

How does the content filter work?

Related topics