I have a dataset of user messages I am trying to fine-tune a model on. For each training sample, I pass the sample into the moderation API first and remove it from the dataset if it is marked as flagged. Then, I upload the remaining data samples and try to run a fine-tuning job.
However, despite the fact that the moderation endpoint says all my samples are okay, I still am blocked by the same error:
The job failed due to an invalid training file. This training file was blocked by our moderation system because it contains too many examples that violate OpenAI’s usage policies, or because it attempts to create model outputs that violate OpenAI’s usage policies.
How else can I possibly clean my dataset for fine-tuning if not with OpenAI’s own moderation endpoint?
I just had the same issue. Solved it by removing from the dataset the samples that had at least one category from response.results[0].categories as True. Pretty bad that flagged don’t have the same criteria as the finetuning validation tho. They should solve it.
I tried this and it still didn’t work. I excluded every sample that had even a single category flagged, it reduced my sample count to 6% of what it was before.
Still, my tiny training file is getting rejected because of the content filters. I don’t even know what use fine tuning will be if I can’t figure out what data is valid.
The aggressive moderation on training data is really not ideal, but could be made manageable with some tooling or visibility into exactly which examples in dataset are being blocked.
Same here. I found 17 fine-tuning samples out of 900 to be “harmful” using the OpenAI Moderation API independently. So, I removed them.
But, still, after presenting the rest (non-harmful) samples, the fine-tuning UI still shows the same error:
The job failed due to an invalid training file. This training file was blocked by our moderation system because it contains too many examples that violate OpenAI’s usage policies, or because it attempts to create model outputs that violate OpenAI’s usage policies.
It seems that OpenAI has a different model behind the Moderation API and the fine-tuning moderation. I’ve messaged the support team but I haven’t got any response. It’s sad because this blocks our development of custom models.
There should be some more helpful indicator on the fine-tuning platform, something that shows which sample is harmful or which sample violates any OpenAI policies.