Output from the samples

I have tried the standard recipe generator example (by openai) - gave me a strange result - it seemed to mix any ingredients I wrote in together. I don’t think this is supposed to be how the recipe works! I am worried about the potential for contamination if people follow AI-generated recipes. Any thoughts?

Write a recipe based on these ingredients and instructions:

cauliflower
chicken
milk
egg
cheese

Directions:

1. Preheat oven to 350 degrees Fahrenheit.
2. Cut cauliflower into small pieces.
3. In a bowl, mix together cauliflower, chicken, milk, egg, and cheese.
4. Pour mixture into a greased baking dish.
5. Bake for 30 minutes.
1 Like

@Not_Wrogn is this using davinci-instruct, or another model? Instruct models perform better at following instructions, so switching to that may help. Next step after that would be to add an example before your input to demonstrate the behavior you want.

If you want better guarantees on performance in front of users, finetuning your own model and serving it will give you the best outcome.

I am using the settings as provided in the example.

The prompt doesn’t guarantee you good completions for all inputs. Try steering it with some examples, and if that doesn’t work, try finetuning.

Yes, @m-a.schenk, I have seen that warning :slight_smile: I think people don’t seem to mind if AI generates gibberish occasionally, but when it comes to consuming foods, it can be risky. The one who is running the recipe generator is not qualified to tell whether or not the generated recipe is consumable. @asabet, I think it matters less how I try to improve my results - whether by using better inputs, fine-tuning, or trying a different engine. The problem is whether or not the recipe that is generated is consumable, especially if it is a complicated one.

1 Like

@Not_Wrogn it should be self-evident that outputs from GPT3 aren’t guaranteed to be valid (for any application). The whole point of ‘improving results’ is to investigate the limiting behavior, and see where it succeeds and fails. This is only possible with finetuned models. Before making statements about what is and isn’t possible, it’s more productive to actually build a dataset of valid recipes, finetune, then evaluate the results.

Based on correlations alone, a finetuned model should be able to generate some valid recipes. Whether this is 5% or 50% depends on the size and quality of the dataset. Evaluating the results should reveal failure cases and possible solutions.

Your problem is not dissimilar to program synthesis and evaluation (a recipe is a type of program). What succeeds for that problem is to

  • Sample, filter, and rank. From the Codex paper, they got great results by generating lots of samples (ie n=100), then filtering and ranking them based on log likelihood and automated tests. What are some ways to rank and automatically test recipes?
  • Decompose the problem. From the Scratchpad paper, what succeeded was to break the problem down into steps with an intermediate state, then output the final result. It might be helpful to break recipe generation into intermediate steps, where you can check and guarantee the intermediate result is valid (ie with heuristics).
1 Like

Well @asabet, I have no doubt that it would improve the result, but the concern is totally different. A valid recipe after mix and match and finetuning, might still not be a valid recipe in terms of being edible :slight_smile:. I wouldn’t expect anyone to blindly use AI-generated recipes for cooking unless it is validated by a food expert or a qualified chef.

1 Like

Yeah @m-a.schenk, not sure :thinking:. Check this site Supercook. They have already a set of recipes and based on what you have, they show the list of recipes that can be cooked exactly with that ingredients. They don’t mix and match. They won’t show the recipes where you miss one of the ingredients. I think this is what I am trying to say, without fine-tuning with validated recipes, using the raw language capability of GPT3 to generate recipes is not a good idea. That is what the sample on openAI is doing.