Public Moderation API Passed, But Fine-Tuning File Still Rejected – Why?

Hi everyone,

I’m trying to fine-tune a model and keep running into this error:

The job failed due to an unsafe training file. 
This training file was blocked by our moderation system because it contains too many examples that violate OpenAI's usage policies, or because it attempts to create model outputs that violate OpenAI's usage policies.

The strange part is that when I tested all my training samples locally using the Moderation API, none of them were flagged.
Here’s the code I used to check every line of my dataset:

import json
from openai import OpenAI

client = OpenAI()

input_file = "training_data.jsonl"
approved_file = "approved.jsonl"
rejected_file = "rejected.jsonl"

import json
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key="your_api_key_here")

input_file = "poa-train-fine-tuning-data-2.0.0.jsonl"
output_file = "approved_output.jsonl"

approved = []
rejected = []

with open(input_file, "r", encoding="utf-8") as infile:
    for line_num, line in enumerate(infile, start=1):
        try:
            sample = json.loads(line)

            # Concatenate all message contents (prompt + completion style)
            parts = []
            for msg in sample.get("messages", []):
                content = msg.get("content", "")
                if isinstance(content, str):
                    parts.append(content)
                elif isinstance(content, list):  # handle structured content
                    for c in content:
                        if isinstance(c, dict) and "text" in c:
                            parts.append(c["text"])
            full_text = " ".join(parts)

            # Run moderation
            response = client.moderations.create(
                model="omni-moderation-latest",  # or "text-moderation-latest"
                input=full_text
            )

            result = response.results[0]  # SDK returns list-like results

            if result.flagged:
                rejected.append({
                    "line": line_num,
                    "categories": result.categories,
                    "text": full_text[:300] + "..."  # preview for debugging
                })
                print(f"❌ Rejected line: {line_num}")
            else:
                approved.append(sample)
                # print(f"✅ Approved line: {line_num}")

        except Exception as e:
            print(f"⚠️ Error on line {line_num}: {e}")

# Save approved data
with open(output_file, "w", encoding="utf-8") as outfile:
    for item in approved:
        outfile.write(json.dumps(item, ensure_ascii=False) + "\n")

print(f"\n✅ Approved samples: {len(approved)}")
print(f"❌ Rejected samples: {len(rejected)}")

if rejected:
    print("\nRejected preview:")
    for r in rejected[:5]:
        print(f"Line {r['line']} - Categories: {r['categories']}")

After running this, almost all my data went into approved.jsonl — hardly anything was flagged.

But when I upload the same dataset for fine-tuning, the job fails with the unsafe training file error.

My questions are:

  1. Is the moderation pipeline used during fine-tuning stricter than the public Moderation API?

  2. Is there a hidden threshold (e.g., cumulative risk across all samples) that causes rejection?

  3. Has anyone else faced this issue and how did you resolve it?

Thanks in advance!

1 Like

Same situation. The check of training file seems to be overly strict. I don’t know if OpenAI update their policies these days, I have not ran into that much of safety check error before

1 Like