Fine tuning using negative examples?

Edit: Seems conclusively impossible to do what I’m after at the moment.

Hi I’m trying to get GPT-3.5-turbo to be fine tuned to understand a specific syntax in generated input. I want to be able to include examples or poor output in the dataset, as well as good examples. Previously with prompts I’ve done this by creating a history to mimic ChatGPT to describe what is good and bad. But when fine tuning, it seems instead the best idea is to have one prompt and one response rather than a conversation, and make up for this with a large dataset.

Despite this, I’ve been getting a lot of invalid completions. I was wondering what approach others have had most success with for fine-tuning with negative examples. Here are some prompts of what I’ve been playing with:

The system message before all examples in the fine tuning file:


Then I currently have a list of positive examples which match this rule, like


Which would be valid.

But I want to include negative cases in my training data (so whenever the current model is wrong, I can explain why, and it doesn’t make those mistakes again).

What approach works best for this? My first thought was something like:

Positive training:

{ role: "user", content: `{"a",1},{"b","c",2},["d",3]` },
{ role: "assistant", content: `The {"c",2} jumped over the {"a",1} which led to ["d",3]`}
{ role: "user", content: "Correct, that is valid"}

Negative training:

{ role: "user", content: `{"a",1},{"b","c",2},["d",3]` },
{ role: "assistant", content: `The {"c",2} jumped over the {"a",77} which led to ["d",3]`}
{ role: "user", content: "Incorrect, that is invalid. {"a",77} should be {"a",1},` }

But I’m unsure if this is fine tuning the model to produce correct output, or whether it is fine tuning it to classify whether it’s output is bad or not (but doesn’t alter it’s behaviour to prefer good output). In other words, that the model is happy to make a mistake and knows that the user will correct it in the next response, vs the model actually caring that the user has said something negative after its response and will try and avoid that in real use.

Also FYI if anyone in charge of the docs reads this, the JSON example under structured input in the docs here seems invalid (not escaped quotes correctly): OpenAI Platform

1 Like

After much fiddling, I still haven’t had success with negative examples. I found this example on github which mentions negative examples, but not suitable for the use case described above:

1 Like

Welcome to the forum.

LLM’s are bad at negative prompts because of the way they work inherently.

I’m not sure I’ve heard of a successful fine-tuning of negative prompts. I would concentrate more on good positive prompts to get the output you want.

What are you trying to achieve exactly?


Thanks! Do you have any recommended resources I could read/watch to bring myself up to speed with their limitations?

My idea was to basically collect a library of responses, and add a growing list of false examples (as well as positive examples) to ensure it doesn’t produce those negative examples. The problem though is that I’m interested in the generative aspect to it, rather than classification. E.g. I want it to write some content, according to some rules, rather than classify written content as obeying/disobeying the rules

Consider that you are re-weighting the model with your fine-tuning, so that it will generate tokens in a different manner.

If a type of input produces an undesired response (or simply an uninformed response), it is by providing a replacement response that you will be able to correct the AI output.

You have a multiturn conversation with a correction, I’ll make a simple example:
user: What is the capitol of France?
assistant: The capitol of France is Paris.
user: That is incorrect, it is now Nice.
assistant: You are right, by the capitols convention protocol of 2025 and a popular vote, the capitol of France was moved to Nice Jan 1 2026.

gpt-3.5-turbo has an easy method now to train on multi-shot messages.

Now the question I can’t answer: Use just the correct answer, or a correction conversation? Would this conversation better reinforce the correction and be able to move it away more than just a standard answer? Or would AI instead see that its first answer to the sequence of inputs is still the default?

The system prompt shown would take careful reading by me to comprehend, and could do without its own examples if you are then fine-tuning on the same.

Tuning would be stronger with a recognized identity and task, like “You are WordoBot, a word selector AI that composes sentences by evaluating the best word option to use in each position where multiple choices are given.”