API endpoint to replicate examples on the "GPT-4 for content moderation" blogpost

ppaudel · August 21, 2023, 6:25pm

Really interesting piece of use case demonstrated in this blog post. Using GPT-4 for content moderation. I was wondering how can I replicate something similar.
The Moderation endpoint (OpenAI Platform) seems to return scores on the pre-defined list of categories. But , similar to how it is demonstrated in the example blogpost, how can I provide my own content policy?
Or do I just provide the content policy as part of my prompt, and ask to classify which policy it violated (without using the moderation endpoint)?
Any pointers would be appreciated.
Thanks

Foxalabs · August 21, 2023, 8:03pm

Hi and welcome to the developer forum!

The system described is not using the moderation endpoint directly, it is a combination of a trained smaller model (this bit I am still seeking clarity on) and GPT-4 as the trainer using a set of example scenarios and a pre classified set of desired responses. The smaller model can then be fine tuned to respond in the desired fashion.

I’ll go thought it in more detail and try and give you a round up of methods.

_j · August 21, 2023, 8:53pm

The article doesn’t provide a solution, nor even a paper. It is just an exploration - they want you to burn through GPT-4 and fine tune a base model based on the results so you can pay for your own experiment in making a moderator.

From the chart, it seems some work was done, but no end product for you, only for them.

Every example taken to give positives or negatives requires human moderation results and human review of GPT-4 quality against the human moderator to then try to teach GPT-4 with better human-written rules to reduce errors in moderation. And of course you can’t train a GPT-4 itself beyond a prompt, therefore your GPT-4 with better system rules is not the product.

Why even involve the language AI when you’ve already created a human set of pass/no-pass to train a base model on?

Topic		Replies	Views
Custom content moderation - like but not based on OpenAI model API	1	698	December 12, 2024
Stuck in Post Moderation Mud – Anyone Got a Winch? 🆘 API gpt-4	4	584	April 14, 2024
Need Help: Facing OpenAI Usage Violation Due to user's Abuse API moderation , best-practices , gpt-4o-mini	11	858	November 17, 2024
Clarification on Using Moderation Model to Avoid Policy Violations API gpt-4 , api	3	487	October 9, 2024
User Content Review and Analysis API gpt-4	4	592	February 7, 2024

API endpoint to replicate examples on the "GPT-4 for content moderation" blogpost

Related topics