Hi everyone,
I’m trying to fine-tune a model and keep running into this error:
The job failed due to an unsafe training file.
This training file was blocked by our moderation system because it contains too many examples that violate OpenAI's usage policies, or because it attempts to create model outputs that violate OpenAI's usage policies.
The strange part is that when I tested all my training samples locally using the Moderation API, none of them were flagged.
Here’s the code I used to check every line of my dataset:
import json
from openai import OpenAI
client = OpenAI()
input_file = "training_data.jsonl"
approved_file = "approved.jsonl"
rejected_file = "rejected.jsonl"
import json
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key="your_api_key_here")
input_file = "poa-train-fine-tuning-data-2.0.0.jsonl"
output_file = "approved_output.jsonl"
approved = []
rejected = []
with open(input_file, "r", encoding="utf-8") as infile:
for line_num, line in enumerate(infile, start=1):
try:
sample = json.loads(line)
# Concatenate all message contents (prompt + completion style)
parts = []
for msg in sample.get("messages", []):
content = msg.get("content", "")
if isinstance(content, str):
parts.append(content)
elif isinstance(content, list): # handle structured content
for c in content:
if isinstance(c, dict) and "text" in c:
parts.append(c["text"])
full_text = " ".join(parts)
# Run moderation
response = client.moderations.create(
model="omni-moderation-latest", # or "text-moderation-latest"
input=full_text
)
result = response.results[0] # SDK returns list-like results
if result.flagged:
rejected.append({
"line": line_num,
"categories": result.categories,
"text": full_text[:300] + "..." # preview for debugging
})
print(f"❌ Rejected line: {line_num}")
else:
approved.append(sample)
# print(f"✅ Approved line: {line_num}")
except Exception as e:
print(f"⚠️ Error on line {line_num}: {e}")
# Save approved data
with open(output_file, "w", encoding="utf-8") as outfile:
for item in approved:
outfile.write(json.dumps(item, ensure_ascii=False) + "\n")
print(f"\n✅ Approved samples: {len(approved)}")
print(f"❌ Rejected samples: {len(rejected)}")
if rejected:
print("\nRejected preview:")
for r in rejected[:5]:
print(f"Line {r['line']} - Categories: {r['categories']}")
After running this, almost all my data went into approved.jsonl
— hardly anything was flagged.
But when I upload the same dataset for fine-tuning, the job fails with the unsafe training file error.
My questions are:
-
Is the moderation pipeline used during fine-tuning stricter than the public Moderation API?
-
Is there a hidden threshold (e.g., cumulative risk across all samples) that causes rejection?
-
Has anyone else faced this issue and how did you resolve it?
Thanks in advance!