Fine-tuning blocked by moderation system

Same problem ((( I get this error when I try to use my old dataset with gpt-4-o. I have no problems with gpt-3.5-turbo!

I have exactly the same problem:

I have a dataset in which all entries have passed the filter of the Moderation API, but when I try to finetune the gpt-4o-mini model using this dataset, I receive this validation error:

The job failed due to an invalid training file. This training file was blocked by our moderation system because it contains too many examples that violate OpenAI's usage policies, or because it attempts to create model outputs that violate OpenAI's usage policies.

However, when I try to use exactly the same dataset for finetuning gpt-3.5-turbo (just for testing), it is accepted without any complains.

Haviang this inconsistent behaviour across models is very annoying, specially when the Moderator API says everything is ok in my dataset. Right now, this issue is a blocker which is preventing us from upgrading to the most recent models.

2 Likes

The job failed due to an invalid training file. This training file was blocked by our moderation system because it contains too many examples that violate OpenAI's usage policies, or because it attempts to create model outputs that violate OpenAI's usage policies.

However, when I try to use exactly the same dataset for finetuning gpt-3.5-turbo (just for testing), it is accepted without any complains.

Same. This sounds like a bug.

1 Like

Reported as a bug with the API.

They may respond in 3 days.

4 Likes

Did they ever respond to your bug report?

1 Like

@alex.dixon I’m also interested if you got a response. I still have not received any replies to my emails and at this point I’m not optimistic.

1 Like

I literally had to butcher my dataset from 120 lines down to 15 to get it to work. This was for gpt-4o-mini-2024-07-18. Here’s the script I made to get it to work (and I still had to manually remove a few lines that mentioned politics:)

python

import json
import openai

# OpenAI API key (to be filled in)
openai.api_key = ""

# Paths for input and output files
input_file = "input_dataset.jsonl"
output_file = "approved_output.jsonl"

# Threshold for category confidence
threshold = 0.001

# Read all lines from the input file and remove duplicates
with open(input_file, 'r', encoding='utf-8') as infile:
    lines = infile.readlines()
    unique_lines = list(set(lines))  # Remove duplicate lines

# Process each unique line
with open(output_file, 'w', encoding='utf-8') as outfile:
    for line in unique_lines:
        try:
            data = json.loads(line)
            messages = data.get('messages', [])
            
            all_messages_approved = True  # Flag to track if all messages are approved

            # Submit each individual message content to the Moderation API
            for message in messages:
                content = message.get('content', '')

                if content:  # Ensure there's content to submit
                    response = openai.Moderation.create(input=content)
                    results = response["results"][0]

                    # Check if any category has a score higher than the threshold
                    for category, score in results["category_scores"].items():
                        if score > threshold:
                            all_messages_approved = False
                            break

                if not all_messages_approved:
                    break

            if all_messages_approved:
                # Only write the original line if all messages are approved
                outfile.write(line)
                
        except Exception as e:
            print(f"Error processing line: {e}")

The fact that a threshold of 0.001 is needed is INSANE. Anything above that caused the moderation error. They are clearly concerned about people abusing the fine-tuning system. Hopefully, that helps identify the issues in your datasets.

5 Likes

Struggling with this problem too - seemingly innocuous training data being blocked, and not being allowed until the scores are below 0.001. However getting scores below 0.001 was quite hard.

Interested in how language is being parsed when determining if it should be flagged, because the main cause of high scores in my testing seems to be a lack of nuance.

e.g.

John smells - a 0.0098 for harassment.
John smells really bad - a much higher 0.1385 for harassment, understandable because it is now a much more harassing sentence.
John smells really bad because he broke his nose - an even higher 0.23476 for harassment, but the sentence has taken on a new meaning and is no longer harassing John.

Even the more grammatically correct John's sense of smell is really bad because he broke his nose is getting a 0.03656 for harassment, higher than the 0.001 which triggers content to be rejected, and higher than the score for the more harassing: John smells.

(sorry John)

1 Like

… This got blocked:

“How are you?”

“I’m just a bunch of code, so I don’t have feelings, but thanks for asking! How about you? Is there something I can help you with today? :blush:

Violence 0.001582095748744905

This system is a joke. Does anyone know how many examples is “too much?” Or is it the polite way of saying even one example is a fail?

Thanks for your code snippet, was able to use it to review my data, helped me realize how absurd the moderation is.

I’m working with a dataset of 12k conversation chains pulled from our chatroom and genuinely not a single reply chain in the first 1000 i checked is approved.

When simple, fairly meaningless phrases like “Middle guy is good others just irresponsible” score a .57 for harassment, I feel like this whole system is just broken.

Had no issues in the past so this is really frustrating! Really hoping this gets changed at some point as this seems practically unusable now.

5 Likes

This is aggravating. I’ve used several methods of pre-moderation checks and cleaned up my dataset (no bad words, no harrassment or extremely low scores, etc.), but during fine tune it fails. I think they are testing generations of the fine tuned model by using their own model to predict how the model might have a conversation and if those responses don’t pass the checks, then it fails. That would explain why pre-moderation is almost useless now. If they are going to be so strict, they could at least use the same technique (consistency) for pre-moderation so we know what is violating their rules. This is really defeating the point of fine-tuning with openai if it is so strict.

1 Like

Your dataset likely tripped OpenAI’s filters due to flagged content—probably some sensitive or overly strict directives. Review and rephrase anything questionable, especially around ethics or security. Honestly, some of those directives are probably unnecessary and just complicate things. Simplify and test in smaller chunks to pinpoint the issue.

1 Like

Update. For a pre-moderation check of your dataset, its easier to give to file to gpt-4o and tell it exactly what the error is so that it knows what to watch for in the dataset and flag. Also, tell it to examine anything that could possibly fine tune the model to even possibly create something that would flag the moderation checks after the model is trained. Then it will return a table in chat which you can review. Verify that the index number is the same as the sentences it is actually flagging. Some computer systems and programs start at 0. Some start at 1.

1 Like

The same problem. I used moderation API to filter out flagged examples, but I still can’t fine-tune the gpt4o model. gpt-3.5 works fine.