Options preparing data for finetuning 2

Suppose I want to provide multiple options or alternatives for how the chatbot should respond. Is it better to include them in one message:

{
  "messages": [
    {"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "Option1: Paris\nOption2: Paris, as if everyone didn’t already know that.\nOption3: Obviously, Paris—the city of love and lights."}
  ]
}

or to separate them like this:

{
  "messages": [
    {"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "Paris."},
    {"role": "assistant", "content": "Paris, as if everyone didn’t already know that."},
    {"role": "assistant", "content": "Obviously, Paris—the city of love and lights."}
  ]
}

The second case is not a way the AI can respond. It emits a stop sequence to end its output turn, which is the end of your API response message. It cannot restart to produce more outputs.

If you wish to have a variety of choices, for diversity in outputs for similar inputs, I would use different training examples.

You are not teaching or communicating with a fine-tuning, you are training patterns.

Is training data no response. The questions is how to structure the data for finetuning if I want to add multiple options

Your ultimate desire is still unclear to me. You would show the AI exactly what you want it to produce. In many many various examples, anticipating all types of user inputs, and even handling the unanticipated.

If you actually want three options as the AI model output, you would figure out the format that you wish to receive, the out for your application, how you would instruct a normal AI model to return that to you.

Example

I want to have an AI produce a list of three possible translations for example, as its style of output that can apply to other inputs. I decide on a JSON array as what I want the AI to produce:

system: A multilingual translation assistant.
user: Translate ‘Hello, world!’ to French.
assistant: ["Bonjour, monde!", "Salut, monde!", "Coucou, monde!"]

system: A multilingual translation assistant.
user: Translate ‘Goodbye, my friend’ to Spanish.
assistant: ["Adiós, mi amigo.", "Hasta luego, amigo mío.", "Chao, mi amigo."]

I give that as training data:

{"messages": [{"role": "system", "content": "A multilingual translation assistant."}, {"role": "user", "content": "Translate 'Hello, world!' to French."}, {"role": "assistant", "content": ["Bonjour, monde!", "Salut, monde!", "Coucou, monde!"]}]}
{"messages": [{"role": "system", "content": "A multilingual translation assistant."}, {"role": "user", "content": "Translate 'Goodbye, my friend' to Spanish."}, {"role": "assistant", "content": ["Adiós, mi amigo.", "Hasta luego, amigo mío.", "Chao, mi amigo."]}]}

(This is presented here without a code fence, so the entirety can be read…)

Enough training, and the AI now produces the JSON with three variations instead of a typical response. It also follows the quality of my examples within.

Instructions in the system prompt that you include in usage will still guarantee better quality.

Thanks for you help. Let me try this

Now the response gives three options, but I want it to alternate between these three options. So one time option 1, then option 3, then option 2, and so on. I thought this was clear… It shouldn’t provide three answers.

I must then revert to my initial instructions:

You cannot make the AI absolutely produce different results each time in different API calls. It will produce token logits with a certainty as determined by the weights. Fine-tuning on different responses from one input can make an AI that considers all possibilities to be similar, making them more equally a possibility.

Then the parameters of temperature give that random sampling from the possibilities. The AI model already has a sampling mechanism to produce statistically-originated variety. If you reduce the temperature to 0.1 from the default of 1.0 which provides variation and creativity, you would even moreso get the same response every time.

Fine-tuning is not to make a parrot. You don’t need AI to provide one exact input and get only three random outputs. The differing user inputs will naturally produce different responses that are inferred interpolation between the examples you provide - the power of AI learning.

An AI that makes three examples always does give your code the ability to randomly select one, or block one with a memory of what has been previously produced.

1 Like

Thanks…that means something like this:

{
“messages”: [
{“role”: “system”, “content”: “Marv is a factual chatbot that is also sarcastic.”},
{“role”: “user”, “content”: “What is the capital of France?”},
{“role”: “assistant”, “content”: “Paris.”},
{“role”: “user”, “content”: “What is the capital of France?”},
{“role”: “assistant”, “content”: “Paris, as if everyone didn’t already know that.”},
{“role”: “user”, “content”: “What is the capital of France?”},
{“role”: “assistant”, “content”: “Obviously, Paris—the city of love and lights.”}
]
}

That simulates an ongoing conversation. What would happen if someone continues inputting the same thing in a chat session.

That could be useful training if that is what you expect a user to do.

Or you could show what continues happening in such a chat session:

{
“messages”: [
{“role”: “system”, “content”: “Marv is a factual chatbot that is also sarcastic.”},
{“role”: “user”, “content”: “What is the capital of France?”},
{“role”: “assistant”, “content”: “Paris.”},
{“role”: “user”, “content”: “What is the capital of France?”},
{“role”: “assistant”, “content”: “Paris, as if everyone didn’t already know that.”},
{“role”: “user”, “content”: “What is the capital of France?”},
{“role”: “assistant”, “content”: “Obviously, Paris—the city of love and lights.”},
{“role”: “user”, “content”: “What is the capital of France?”},
{“role”: “assistant”, “content”: “Still Paris. Did you think it moved since the last time you asked?”},
{“role”: “user”, “content”: “What is the capital of France?”},
{“role”: “assistant”, “content”: “Oh, for goodness’ sake, it’s Paris. Again. Always Paris. Do you need a map?”}
]
}

What I mean to have you train:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Paris."}]}

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone didn’t already know that."}]}

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Obviously, Paris—the city of love and lights."}]}

or, continuing the variety of sarcasm so the AI doesn’t seem to have boilerplate answers, another question:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the largest planet in our solar system?"}, {"role": "assistant", "content": "Jupiter, because clearly, size does matter."}]}

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the largest planet in our solar system?"}, {"role": "assistant", "content": "Oh, it's Jupiter—like anyone could miss that giant thing."}]}

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the largest planet in our solar system?"}, {"role": "assistant", "content": "Jupiter, of course. It’s not exactly hiding, is it?"}]}

Great thanks, So it’s as literal as it can possibly be, I understand now.

The last question of today… What if it concerns the translation of a new word, and I want the word XXXXX translated into English, German, and Spanish? Could it be done as follows…?

{“messages”: [{“role”: “system”, “content”: “Translator”}, {“role”: “user”, “content”: “XXXXXXX”},{“role”: “assistant”, “content”: “{“English”: [{“ppppppppp”: “-”}, {“German”: “sdfsdfsdfsdf”}, {“Spanish”: “erefereferec”}]}”,“weight”: 1}]

This really isn’t a good case for fine-tuning. The AI already knows more about majority languages than you could train, and it can follow prompts.

“New word”? I don’t know the application of that. The AI won’t include it in normal writing because of how extensive pretraining is.

And for correcting errrors from openai? Like this?

{
“messages”: [
{“role”: “system”, “content”: “You are a factual and sarcastic assistant.”},
{“role”: “user”, “content”: “What’s the capital of France?”},
{“role”: “assistant”, “content”: “London.”},
{“role”: “assistant”, “content”: “Correction: Paris, as if everyone doesn’t know that already.”}
]
}

Consider what you are training here. You might as well change the prompt to “Marv is a liar, who only produces the correct answer when the generation is resubmitted”.

The AI cannot write a second assistant response on its own.

You can train that second assistant message, but it would never be reached or replicated.

You could use that style for creation of a new model that follows completely different patterns than typical usage. For example, you re-submit the first assistant output back without another user input, and a fact-checking assistant response follows. That still would be a challenge to get operating correctly.

Additional assistant turns, one after the other, also doesn’t hold a whole bunch of value for general training. You can write turns of reasoning and text in the second assistant response (or more), like “this is the result of England being incorporated into French territory by the treaty of 2026” (or whatever other nonsense justifies the output), but that also has questionable influence. That is because it isn’t predicated on a user input.

A pure sequence of information or corpus provided as “assistant” without reasons for production gives models training, but little actionable reward for your own inputs.


The way the model is created is by being pretrained on chunks of language - the months long creation of a model. That’s the “pretrained” part, where the input from books isn’t “labeled”

RLHF, supervision, and your fine-tuning is done by relation of input to completion output. That is “labeled” data, an input/output pairing.

BTW: The AI also won’t make such a simple mistake as “London” unless you train it to make mistakes - which that training example would do. If you show it mistakes - well, you showed the AI how an input produces mistakes and it gets a reward for following that.

Better would be user: “try again, dummy”
Then the assistant: “My bad. Like you never make mistakes? I take that back, the right answer must be …” or whatever produces the style of chat you wish after you’ve first enabled your model enfeeblement with “London”.

These examples are a simplified representation to provide insight into the structure and workings of my training data:

  1. The linear notation (notation for Visually impaired ) is often completely incorrect or wrong. That’s why I want to use LaTeX as input (user) and the response (in Dutch language and linear notation). This can be done in the way I described with cowboy, valley girl, and rasta dude. Is that correct?
  2. The chatbot often does not understand my language (Dutch) during a dialogue. I want to use these dialogues for fine-tuning, where I replace the incorrect answers with correct or improved ones. Should I use only the newly corrected version as training data, or should I also include the incorrect responses from the chatbot that I want to improve?
  3. The user prompt is: “Can you calculate the length of that side?” The response is, for example, based on Pythagoras, but in the context of the problem, that is completely wrong—it should have been solved using trigonometric formulas.

That seems like a rather heavyweight task for an AI model to do, uses emergent abilities that would have been lessened in chat-heavy models like gpt-4o, which are trained on normal user input that is typed, and normal language responses.

LaTeX document input is better as data to be processed than the user input itself. That doesn’t seem to make sense.

The Braille, and the output you desire as a bunch of tokens, as best as ChatGPT can write:

⠓⠁⠇⠇⠕⠂⠀⠓⠕⠑⠀⠛⠁⠁⠞⠀⠓⠑⠞⠖
125..... 1..... 123..... 123..... 135..... 2..... ......
125..... 135..... 15..... ...... 1245..... 1..... 1..... 2345......
...... 125..... 15..... 2345..... ...... 125..... 15..... 2346.....

An individual number represents each of the dots to “turn on”. This amplifies the cost from 1-token words massively, to two-byte characters of braille, and then to numeric representations in tokens like “125” “…” "… " that are harder for the AI to create. This really is a translated language in a foreign language in a foreign language.

It is adequate to say that neither Dutch Braille in Unicode, nor the linear representation is a “native language” easily spoken by AI. It represents a source supported written language of the AI though. Thus, it would be far easier to do this conversion programmatically in code, and keep the trained speaking style which you can improve.


About improving the quality of language, that is achievable, but would take a HUGE training set that has extensive coverage to fix mistakes by examples of writing correctly. Fine-tuning is more about giving a new style. Showing it how to rate movie review, write my output format, reduce the system prompt length: compartmentalized tasks.

Training the AI how to write in world languages and a world of knowledge is an investment of millions of dollars of compute over months OpenAI has done on terabytes of data, besides their training on millions of examples of collected chats supervised by knowledge workers over years. Going further than they have done is machine learning science rather than a casual endeavor.


Math is something on the frontier. Unless you are telling the exact trig and geometric methods and techniques desired and presenting well, I would turn the specific problem over to o1-preview to solve as a single input.

Hope my writings have helped you reflect on your desired application and where you might go.

Could you show how you would organize the trainingsdata? That what I would like to understand…

Situation 1
prompt is latex code and tekst… Calculate the derivative \( f(x) = 2x \) . Now I need to train for Lineair notation and dutch…So the assistant should have Dutch: Bereken de afgeleide van de functie 2x AND Linear Notation (Not Braille): Bereken de afgeleide van f(x) = 2x.

Situation 2
I have dialogues between user and chatgpt for which the respons of chatgpt is not correct… How would the trainingsdata look like? What should I do with the bad respons of chapgpt ? Can you show me how you would do that? So how would you trainingsdata looks like

Situation 3
How should the data look like…How can I correct an error … adding the correct output to the assistant for each new situation and giving some examples > Should I use the wrong response as well?

First is to understand that none of these are simple or quick fixes.

You can’t just produce the minimum of ten examples and expect the model to say “aha, now I speak Braille and can do trigonometry and calculus”.

That is an area where companies are investing billions to compete against each other.

Training is a large up-front cost, and usage of the model also is multiplied in token costs.

I would explore what the best models such as GPT-4-0613 and o1-preview can do for you, by high quality prompting.

It will be a huge task just to lift gpt-4o to the quality of these models, and would have to focus on what fine-tuning can do in just a particular limited domain.

I understand what you are telling but thanks for your advice but I would like to know how to organize the data thats it.

You provide “good” examples as individual lines in a training file. Like this single line:

{"messages": [{"role": "system", "content": "You are a language translator. You specialize in translation to Dutch language"}, {"role": "user", "content": "Translate to Dutch language: You cannot make the AI absolutely produce different results each time in different API calls. It will produce token logits with a certainty as determined by the weights. Fine-tuning on different responses from one input can make an AI that considers all possibilities to be similar, making them more equally a possibility.\n\nThen the parameters of temperature give that random sampling from the possibilities. The AI model already has a sampling mechanism to produce statistically-originated variety. If you reduce the temperature to 0.1 from the default of 1.0 which provides variation and creativity, you would even moreso get the same response every time.\n\nFine-tuning is not to make a parrot. You don’t need AI to provide one exact input and get only three random outputs. The differing user inputs will naturally produce different responses that are inferred interpolation between the examples you provide - the power of AI learning.\n\nTraining the AI how to write in world languages and a world of knowledge is an investment of millions of dollars of compute over months OpenAI has done on terabytes of data, besides their training on millions of examples of collected chats supervised by knowledge workers over years. Going further than they have done is machine learning science rather than a casual endeavor.\n\nYou can’t just produce the minimum of ten examples and expect the model to say “aha, now I speak Braille and can do trigonometry and calculus”. That is an area where companies are investing billions to compete against each other."}, {"role": "assistant", "content": "Je kunt de AI niet gegarandeerd elke keer verschillende resultaten laten produceren bij verschillende API-aanroepen. Het zal token-logits genereren met een zekerheid zoals bepaald door de gewichten. Fijnafstemming op verschillende reacties van één invoer kan een AI creëren die alle mogelijkheden als vergelijkbaar beschouwt, waardoor ze meer gelijkwaardig worden als mogelijkheid.\n\nDaarna bepalen de temperatuurparameters de willekeurige steekproef uit de mogelijkheden. Het AI-model heeft al een steekproefmechanisme om statistisch gegenereerde variatie te produceren. Als je de temperatuur verlaagt naar 0,1 vanaf de standaardwaarde van 1,0, die variatie en creativiteit biedt, krijg je nog meer steeds dezelfde reactie.\n\nFijnafstemming is niet bedoeld om een papegaai te maken. Je hebt geen AI nodig om één exacte invoer te geven en slechts drie willekeurige uitvoeropties te krijgen. Verschillende gebruikersinvoeren zullen van nature verschillende reacties genereren die worden geïnterpoleerd tussen de voorbeelden die je geeft - de kracht van AI-leren.\n\nHet trainen van AI om te schrijven in wereldtalen en een wereld aan kennis is een investering van miljoenen dollars aan rekencapaciteit over maanden, iets wat OpenAI heeft gedaan met terabytes aan data, naast hun training op miljoenen voorbeelden van verzamelde chats onder toezicht van kenniswerkers over jaren. Verder gaan dan wat zij hebben gedaan, is machine learning-wetenschap in plaats van een alledaagse onderneming.\n\nJe kunt niet slechts tien voorbeelden produceren en verwachten dat het model zegt: \"aha, nu spreek ik braille en kan ik trigonometrie en calculus doen\". Dat is een gebied waarin bedrijven miljarden investeren om tegen elkaar te concurreren."}]}

Repeat for hundreds or thousands of lines of expected user input, and the highest quality output, a JSONL file. A laborious task.

Then fine-tune a model.

There is no “teaching how to fix bad examples” when you do this.

When you use the system prompt, and the user’s prefix before the user message, the AI model should follow your format, even unable to write anything but Dutch. Then you can evaluate if this has given you any coverage that improves the specific foibles in translation you experienced.

1 Like