How would you structure solving the "Masterchef Mystery Box Challenge"?

For those familiar with the show, I’m working on a project that essentially emulates the Masterchef MysteryBox Challenge

Goal: Generate a list of x recipes that honour the ingredients in the box, but that can utilise the rest of the pantry to create a delicious meal.

Key points:

  • I have a list of 20-30 grocery items “in the box”
  • I have access to a csv of 3000+ “staple pantry goods” (I’m aware the Masterchef contestants usually only get the basics)

My Current Process:
I’ve been playing around with different variations of this, but have basically got to a point where my process is:

  1. Give it the mystery box and ask for x recipe titles with at least 1 ingredient from the box
  2. Manually delete any stupid titles or ingredients (it likes replacing tortillas with pancakes when the box includes pancakes but not tortillas)
  3. For each recipe title, generate a recipe using ingredients from the inventory that includes the ingredients from the box provided in step 1 and push it out in structured json

This seems to work okay.

My problem: The way I’m doing it now, I have to send 3k inventory items for each recipe request. So, if I have 10 titles, that’s 30,000 rows of a csv I’m sending… works out to about 750k tokens for 10 recipes. A little excessive.

I know this is wrong. What I don’t know, is how I should be doing it.

Simple Answer: “Don’t use the inventory”
I’ve been trying to get it to create creative dishes for a while based on what’s in the box alone and the results tend to be a bit mediocore, or produce things that aren’t stocked locally, or do weird substitutions. So far, providing the full inventory has helped resolve this issue…

So, while I may go back to this, I’d like to try and solve the problem of using it (if only as a learning exercise to better understand the API’s capabilities)

Using the inventory:
So, in addition to reducing the inventory to a more essential list (I don’t need 20 different soy sauces), I believe I come back to these three options:

  1. Assistant API - in theory I should be able to attach a file to this and “somehow” reference this without having to burn tokens each time. But, my understanding is also that each message includes the previous messages, under the hood, and so I’m not sure this actually fixes the token issue? Plus, in playing with the Assistants API, I’ve found it SUPER unreliable - finally got the first iteration of this working and then just kept getting “can’t process this request” out of the blue, for no reason.
  2. Embeddings - Convert+Upload the csv as an embedding, then reference it that way. But my understanding is that this is more for Q&A, rather than for assembling a recipe based on ingredients. Is that something it can do?
  3. Fine-Tuning - my understanding is that I would need to upload a bunch of complete recipes for this to have any value, and that fine tuning wouldn’t really benefit from just a list of ingredients?

So anyway, that’s what I’ve tried and where I’m at. I’d be super grateful to hear if any of you can shed some light on how you’d go about this, or thoughts on my process (it’s been a frustrating journey, please be kind… pancakes instead of tortillas :cry: ).