I tried to run a fine-tuning job using GPT-4o mini but received the following error:
“This training file was blocked by our moderation system because it contains too many examples that violate OpenAI’s usage policies, or because it attempts to create model outputs that violate OpenAI’s usage policies.”
For context, I’m an academic researcher studying the applications of LLMs to classify political discourse. There is some crude language and political debate in the examples I’m providing.
I had previously been able to run the same task with GPT 3.5 so it appears that they might be using a new moderation filter in the API. Has anyone else experienced this issue or had luck having these filters removed?
I got a generic message from support essentially recommending that I remove the bad posts, as you mention. But this solution creates problems for my application because it would mean dropping a lot of data such that the fine-tuned model is no longer comparable (indeed, removing any data from fine-tuning is problematic).
Here is the response for anyone interested:
We understand your concern regarding your fine-tuning job being flagged by our moderation system due to potential violations of OpenAI’s usage policies. We recognize the importance of resolving this issue and are here to assist you. Here are some steps you can take to address this:
Review the Content: Carefully examine your training data to ensure it adheres to OpenAI’s usage policies. Look for any material that could be considered harmful, offensive, or otherwise inappropriate.
Modify the Data: If you find any problematic content, modify or remove those examples. Ensure the data aligns with the guidelines provided in the usage policies.
Use the Moderation API: Utilize OpenAI’s Moderation API to pre-check your training data for potential issues. This can help you identify and address problematic content before submitting it for fine-tuning.
The message did not specify where the violation occurred.
It is also ambiguous how the Moderation API should be used to address the issue (e.g. Are any potentially violating examples allowed or is there some threshold?).
Support reached out again after I responded and encouraged us to try again. I made an attempt using a different dataset (SemEval 2018, a public benchmark for stance detection on Twitter) but also got the same error. It seems like the moderation system is overzealous either about political content or offensive language (the data contain both, but are mostly just short political statements).
Same issue here – makes fine tuning with moderation examples effectively impossible (had zero issues with 3.5 turbo).
Tried using the OpenAI Moderation API to filter out any problematic examples in dataset but training file still blocked. Even more confusing, was able to fine tune with some examples that were blocked by Moderation API, so there’s no consistency and really no benefit to screen datasets with it.
Only effective way I could find to get datasets through is trial and error, which is ridiculous and unsustainable for even relatively small datasets.
Same issue. Tried to fine-tune 4o-mini for a classification task, filtered the dataset via both the stable and the latest moderator API, but still blocked for fine-tuning.
The fine-tune job occasionally works for a subset of the data, but failed for the stratified subset.
This training file was blocked by our moderation system because it contains too many examples that violate OpenAI’s usage policies, or because it attempts to create model outputs that violate OpenAI’s usage policies.
It is frustrating this has not been resolved. The model is essentially unusable for our task (we also had some success with small subsets of the data, but this is insufficient for what we meet). I emailed OpenAI to request our dataset to be whitelisted as recommended by their online support but have yet to receive a reply (or even an acknowledgment of our request) almost three weeks later. Please let me know if you are able to get any resolution.
I got the same message. I passed all of my fine tuning data through the moderation API and then calculated the max value for any of the categories. It was 0.12, so that should be OK? Nothing got flagged, yet my model failed to finish
Running into this as well with some pretty innocuous data that I’ve cleaned for anything even remotely egregious. The kicker is that all of my training data has been generated by chatGpt. It’s super frustrating. I’ve messaged “OpenAI Support” but they just seem to be bots that can’t actually address the issue (I get weird answers like I’m not allowed to fine tune or explanation of their September free token promotion).
In previous Dev Day presentations, they explicitly advocated for this approach. OpenAI’s TOS allow for generating synthetic data, so long as it is not used to train a model that competes with OpenAI. If you are fine-tuning their model with GPT output, they are capturing all of the proceeds, the resulting model does not compete with OpenAI, and that is presumably allowed.
I found 17 fine-tuning samples out of 900 to be “harmful” using the OpenAI Moderation API independently. So, I removed them.
But, still, after presenting the rest (non-harmful) samples, the fine-tuning UI still shows the same error:
The job failed due to an invalid training file. This training file was blocked by our moderation system because it contains too many examples that violate OpenAI’s usage policies, or because it attempts to create model outputs that violate OpenAI’s usage policies.
It seems that OpenAI has a different model behind the Moderation API and the fine-tuning moderation. I’ve messaged the support team but I haven’t got any response. It’s sad because this blocks our development of custom models.
There should be some more helpful indicator on the fine-tuning platform, something that shows which sample is harmful or which sample violates any OpenAI policies.
Same problem here. That´s absurd! We can not fix what we can not detect. I have used moderation API and it didn’t found any flagged content. Only can use “try/error” method cutting my jsonl file in chunks and trying each one to detect the problem…
It has been a month since I emailed the researcher access team and the safety specialist team about this issue but I have not received a reply from either.