How does the content filter work?


I would like to know more about the GPT-3 content filter. I already read the OpenAI documentation about it, but I have further questions.

  • Does GPT-3 have a black list knowledge base with unsafe and sensitive flagged content?
  • Do you use sentiment analysis to detect the unsafe and sensitive content?
  • What is the role of users’ feedbacks in the content filter? Do you use both sentiment analysis and feedbacks?
  • Is there some human intervention at some point or does GPT-3 flag the content on its own?
  • Do users’ feedbacks and sentiment analysis results match? How often (percentage)?

Thank you in advance!

Hello Giada,

The content filter classifies text as safe, sensitive, or unsafe, and is currently built to err on the side of caution. That said, we aim to improve the content filter over time, and if/when more specific answers are available, we’ll add them to the documentation.