Fine-tuning blocked by moderation system

I tried to run a fine-tuning job using GPT-4o mini but received the following error:

“This training file was blocked by our moderation system because it contains too many examples that violate OpenAI’s usage policies, or because it attempts to create model outputs that violate OpenAI’s usage policies.”

For context, I’m an academic researcher studying the applications of LLMs to classify political discourse. There is some crude language and political debate in the examples I’m providing.

I had previously been able to run the same task with GPT 3.5 so it appears that they might be using a new moderation filter in the API. Has anyone else experienced this issue or had luck having these filters removed?

5 Likes

Welcome to the community!

Hrm… does sound like they’ve strengthened the moderation a bit.

You might reach out to help.openai.com and explain your use-case.

I can’t think of any other contact info off-hand, sorry.

Please let us know how it turns out, though. Maybe try reducing the quantity of “bad” posts?

2 Likes

Thanks for the reply and welcome, @PaulBellow.

I got a generic message from support essentially recommending that I remove the bad posts, as you mention. But this solution creates problems for my application because it would mean dropping a lot of data such that the fine-tuned model is no longer comparable (indeed, removing any data from fine-tuning is problematic).

Here is the response for anyone interested:

We understand your concern regarding your fine-tuning job being flagged by our moderation system due to potential violations of OpenAI’s usage policies. We recognize the importance of resolving this issue and are here to assist you. Here are some steps you can take to address this:

  1. Review the Content: Carefully examine your training data to ensure it adheres to OpenAI’s usage policies. Look for any material that could be considered harmful, offensive, or otherwise inappropriate.
  2. Modify the Data: If you find any problematic content, modify or remove those examples. Ensure the data aligns with the guidelines provided in the usage policies.
  3. Use the Moderation API: Utilize OpenAI’s Moderation API to pre-check your training data for potential issues. This can help you identify and address problematic content before submitting it for fine-tuning.

For more detailed information on how to handle such issues, you can refer to How we identify problematic content on our services for individuals.

If you need further assistance or have specific questions about your fine-tuning job, please don’t hesitate to reach out to us.

Best,

OpenAI Support

I will post here if I get a fix or any other update.

OpenAI no longer letting us know which lines violated a moderation category?

I can’t even fix my failed jobs…

The message did not specify where the violation occurred.

It is also ambiguous how the Moderation API should be used to address the issue (e.g. Are any potentially violating examples allowed or is there some threshold?).

1 Like

Support reached out again after I responded and encouraged us to try again. I made an attempt using a different dataset (SemEval 2018, a public benchmark for stance detection on Twitter) but also got the same error. It seems like the moderation system is overzealous either about political content or offensive language (the data contain both, but are mostly just short political statements).

1 Like

This is a serious bug - won’t be using fine tuning until openai fixes this

If you have a dataset of 1000, and one political comment is flagged, how would you know what to change?

3 Likes

Same issue here – makes fine tuning with moderation examples effectively impossible (had zero issues with 3.5 turbo).

Tried using the OpenAI Moderation API to filter out any problematic examples in dataset but training file still blocked. Even more confusing, was able to fine tune with some examples that were blocked by Moderation API, so there’s no consistency and really no benefit to screen datasets with it.

Only effective way I could find to get datasets through is trial and error, which is ridiculous and unsustainable for even relatively small datasets.

2 Likes

Same issue. Tried to fine-tune 4o-mini for a classification task, filtered the dataset via both the stable and the latest moderator API, but still blocked for fine-tuning.

The fine-tune job occasionally works for a subset of the data, but failed for the stratified subset.

This training file was blocked by our moderation system because it contains too many examples that violate OpenAI’s usage policies, or because it attempts to create model outputs that violate OpenAI’s usage policies.

1 Like

It is frustrating this has not been resolved. The model is essentially unusable for our task (we also had some success with small subsets of the data, but this is insufficient for what we meet). I emailed OpenAI to request our dataset to be whitelisted as recommended by their online support but have yet to receive a reply (or even an acknowledgment of our request) almost three weeks later. Please let me know if you are able to get any resolution.

I got the same message. I passed all of my fine tuning data through the moderation API and then calculated the max value for any of the categories. It was 0.12, so that should be OK? Nothing got flagged, yet my model failed to finish

1 Like

Running into this as well with some pretty innocuous data that I’ve cleaned for anything even remotely egregious. The kicker is that all of my training data has been generated by chatGpt. It’s super frustrating. I’ve messaged “OpenAI Support” but they just seem to be bots that can’t actually address the issue (I get weird answers like I’m not allowed to fine tune or explanation of their September free token promotion).

Interesting. My training data is also largely generated through chatGPT. Could they try to block that route?

In previous Dev Day presentations, they explicitly advocated for this approach. OpenAI’s TOS allow for generating synthetic data, so long as it is not used to train a model that competes with OpenAI. If you are fine-tuning their model with GPT output, they are capturing all of the proceeds, the resulting model does not compete with OpenAI, and that is presumably allowed.

¯\(ツ)

Good to know. Probably not it then

I found 17 fine-tuning samples out of 900 to be “harmful” using the OpenAI Moderation API independently. So, I removed them.

But, still, after presenting the rest (non-harmful) samples, the fine-tuning UI still shows the same error:

The job failed due to an invalid training file. This training file was blocked by our moderation system because it contains too many examples that violate OpenAI’s usage policies, or because it attempts to create model outputs that violate OpenAI’s usage policies.

It seems that OpenAI has a different model behind the Moderation API and the fine-tuning moderation. I’ve messaged the support team but I haven’t got any response. It’s sad because this blocks our development of custom models.

There should be some more helpful indicator on the fine-tuning platform, something that shows which sample is harmful or which sample violates any OpenAI policies.

Same problem here. That´s absurd! We can not fix what we can not detect. I have used moderation API and it didn’t found any flagged content. Only can use “try/error” method cutting my jsonl file in chunks and trying each one to detect the problem… :frowning:

1 Like

Same issue, also politics related (election campaign emails). Looks like anything political will be banned.

It has been a month since I emailed the researcher access team and the safety specialist team about this issue but I have not received a reply from either.