How do you fine-tune a model to understand how much cheese a customer wants on a pizza?

The topic title is an example scenario.

Say I want to develop a pizza shop where costumers can order a pizza with a prompt.

They might say:
" I want a pizza with tomato mozzarella that’s it"
" White pizza with a lot of cheese"
" pizza with prosciutto and just a little bit of mozz"

On the pizza shop side you only just want back a number between 0 and 10, as a completion, to tell you how much cheese the customer wants.

This is sort of possible with a long prompt, that gives constrains and info on the wanted outcome.
But how do you fine-tune a model that can do this quickly, with less tokens?
How do you prepare the data?
How many data do you need?
What model is best?

Thank you for any insights.

Using the idea of cheese selection:

Why not just explicitly ask the customer to select a number from 0-10 on how much cheese they want? It seems strange that you are asking questions, trying to gauge & beat around the bush for something that is completely subjective.

What does 10 even mean? A full cheese wheel? I’d first like to see each number defined by the amount of cheese. It’s almost like going into a restaurant and being asked “how spicy do you want it?” I don’t know, are we talking Mediterranean spicy or Burger King spicy? I have non-voluntarily sobbed from spiciness because their gauge was completely different to mine.

You would most likely want a lesser model such as ada. Depends on how complicated it is, does it take in consideration the pizza size? The current toppings? Start small for testing. For something like cheese selection there really couldn’t be too much non-repetitive training data. Try different parameters. There’s some really insightful examples in the OpenAI documentation.

Another idea would to have your own pizzas, organized in a table, create embeddings from it. Let the user describe the pizza they want, and then compare it to retrieve the most similar pizzas.

Ronald, thanks for your answer.

As I said, this is an example scenario. It is a question to understand how the API can achieve something like that. It’s not about the user experience that a customer would have while ordering a pizza like this.

This classification, as an exercise, can be applied to many things.

1 Like

What is your end result? A unique pizza tailored to the customer? Or whatever similar pizzas that are in stock?

In terms of something that is easily measurable, what do you hope to accomplish using a chat service rather than a form that immediately feeds into a database?

My end results is just a number between 0 to 10 that estimates how much cheese is described in the prompt.

So you’ll have 10 different quantities of cheese?

Too much cheese = 10
Normal = 5
No Cheese = 0

If you were to convert a table of pizzas to embeddings, you would naturally create this definition between 0-10. You could then use the embeddings to select the best fitting pizza, or simply choose the amount of cheese

1 Like


So my question now is, how do I determine the number (or say the pizza) from the prompt that the user gives me?

If each of the qualities of your pizza can be measured by quantity, you could just use those values to find the most relatable pizza.

If we have a simple comparison of quantity using ingredients: pepperoni, cheese, green pepper, and sausage

A meat lover’s pizza for example could be [8,6,3, 8]
An average pizza could be [5,5,5,5]
A vegetarian pizza could be [0,0,7,0]

Naturally by selecting their preferred topping, someone who wants lots of meat would gravitate towards the meat lover pizza numbers, meanwhile the vegetarian is almost a polar opposite. It’d also mean that you could recommend your plates based on the user’s current preferences.

You could use some NLP to say “Lots of pepperoni = 8”, but each person’s scale is different. You could accomplish this with classification. Of course, there comes more issues, what about textures, crust type, etc.

This by no means is how embeddings work, but I think it’s a good demonstration. Take it with a grain of salt.

I understand how it would be possible to map the vector to pizzas, but to me it is still unclear how it would be possible to train an ada model, for example, to extract those quantities (integers) from a prompt.

For sure it is a classification task, but I am sure there are many nuances to fine-tune a model to be reliable.

I understand @Pasquale

Rough explaining and handwaving instructions which are just “wild guesses” are not going to get you to your end goal. Code matters. Testing matters.

Did you read my lab-tutorial on fine tuning?


Do you want me to fine-tune a single example for you, @Pasquale ? I’m into real-world examples and testing, not guessing.

If so, reply with your prompt-completion pair(s) and I will take a look and test for you.

But first, please go though the lab-based tutorial above and spend some quality time understanding fine-tuning from that tutorial.



Yes it’s a great example @ruby_coder. I really liked your post and it has helped me as well.

I think what I’m trying to say to you is that fine-tuning in this example (which I know isn’t a true example) isn’t the ideal option. It’s already very easy to map these kind of values, as users can use a slider, or decide for themselves based on your predefined scale.

There is very little semantic richness in choosing the amount of cheese for your pizza. It’s almost like using a seaplane as a jetski.

Of course, if one were to demonstrate their actual intentions, the answers could also be more straightforward. Nuances can make huge difference in how you decide to conquer your problem. I’m not going to assume/ignore variables because “it’s only an abstract concept that I want to fit into my actual, hidden, idea”.

I’d like you to check this out:

It’s preconfigured as a question for cheese.

As you can see, a “boat load of cheese” triggers an almost 98% confidence in “lots of cheese” which would be mapped to 8-10. The percentage may be lower because of the URL parsing.

I wrote a quick JS cleanup to remove the +.

const a = document.querySelector('[role="textbox"]'),b = document.querySelector('[placeholder="Possible class names..."]'),re=/[+]/g;a.innerText = a.innerText.replace(re, " ");b.value = b.value.replace(re, " ");b.dispatchEvent(new Event('input', {bubbles:true}));document.querySelector(".btn-widget").click()


Thanks a lot for mentioning this. Apologies, I have not had time to check it out yet, but I will very soon :slight_smile:

My approach would be,
first, use ChatGPT to generate some synthetic data. “You are ordering a pizza, and want 7/10 amount of cheese”

record the response, and the fact you told it 7/10. Do this like a hundred times, then change the number to 4/10, etc. So in the end, you have 1000 pairs of your numeric rating (eg 5/10) and the outputted response.

Put that in your jsonl file, in reverse, so the response is the prompt, and the completion is the number

{“prompt”:“a lot, like a WHOLE LOT of cheese”, “completion”:“8”}

That should allow you to do fine training and receive a model that expects a pizza order as a description, and attempts to output a number (out of ten) as its “completion”.

Potential problems might be, you’ll have to look over each generated response to ensure it really did follow the 6/10 value (as in, “only a little cheese please” would be incorrect for 6/10 and you should not put it in the finetuning jsonl). Also, engineering the prompt so “a whole lot of pepperoni” doesn’t trigger a high cheese value.

I think thats the broad strokes anyway, instruct the AI to generate teh data you want, use that in fine tuning .jsonl

1 Like