AI and Recipe ! (Or structured data with LLM)

Hello everyone !

I would like to share a cool project I’m working on. Of course it’s cool because I fell in love with the problem otherwise it might seem dull and painful.

For example spending one hour adding 1 to the column is_fruit because the loop I tried with (langchain, kor!, gpt 3.5) is not accurate and I didn’t want to spend too much money with gpt 4.0

It’s my learning curve, my friend ! The more I have pain the more I get motivation to use new tools haha.

My topic is automate family diet. The problem might have been already been solved, let’s just say it’s a training for upskill so I’m here to learn.

→ Coding linear programming is going well.
→ I’m have mostly two databases (food, recipe)

The trickiest problem is :
From a recipe (online) → Get the ingredients name, quantity, quantity_unit, transformation (cooked, …) → Get the translation of the ingredients name in the language of my ingredient database.
Example : recipe → (all purpose flour, 1, cup, boiled) → (Wheat refined flour cooked, 120, g, boiled)

For the translation between metric, I think a good old dict (this sentence had a dramatic start) is mandatory.

For the ingredient translation (mapping) I have idea. For example
Loop over ingredient
Step 1: Ask llm which Categories is this aliment from the following list (grains, seed, fruit, …)
Step 2 : Ask llm which Sub categories is this aliment from the following list

Then After narrow down the list of all my ingredients (>1.5k to evolve) to (<100)
Step n : Select the right ingredient in the list that match the name of the ingredients input.

To do this I thought about using (langchain, kor, chat gpt 3.5) because it allows me to have Yes/No answer, single word answer. If I use just a prompt I’m afraid I cannot extract from the text the information (maybe by playing with the arg Maximum length)

However I had to downgrade openai today to use kor so maybe it’s not the best way to do it anymore.
And I always rely on chat gpt because it’s convenient but it might be expensive. I see a lot of guys on linkedin using more fancy llm I have a ok laptop (GTX, ryzen 7) but my previous tried were a bit tricky (slow answer, 80Go of storage, …)

I might have said a lot, I wanted to share with the forum not just asked “how to solve this or that”. And I think everybody knows recipe so this topic can help other, not just me on my crazy quest to automate my family diet !

Thank you for reading !

Exciting to hear your answer !

:eyes:

Looks interesting!

Have you considered using embeddings? Might be considerably cheaper. You get a vector back, and you compare it to the vectors of your categories to see if you have a match. If you have enough RAM, you might even be able to run some top of the line models on your laptop :slight_smile:

Hi Diet !

I like your nickname ! So if I understand correctly. I keep step 1) and for step 2) I use embedding + similarity search.

I can try that on hands example doing step 1) myself just to check if it works. I can use chrome db for example.

1 Like

I tried with GloVe, I had the map between the recipe name and my database : {‘leftover white rice, preferably long-grain or Carolina Gold Cooked’: ‘Cooked white rice’, ‘Eggs Cooked’: ‘Cooked saithe’, ‘zucchini Cooked’: ‘Cooked zucchini’, 'mint ': ‘Pepper mint’, 'green onion ': ‘Red onion’, 'sharp white Cheddar ': ‘Cow White cheese 0%’, 'salt ': ‘Flower of salt’, 'black pepper ': ‘Black pepper’, 'butter ': ‘Peanut butter’}

I’m not a pro, it is promising which means should I push even push push in this direction ?

1 Like

It’s an option :slight_smile:

Glove is just a simple word embedding model if I recall correctly. You can leverage the vast knowedge of LLMs by using LLM derived embedding models, such as OpenAI’s text-emedding-3-large, for example. The mistral derived embedding models are even more powerful.

1 Like

I push in this direction with

EMBEDDING_MODEL = “text-embedding-3-small”

And I return the top 10 nearest embedding. I think the solution will be in the list most of the time (let’s push in this direction at leat I belive in :red_car:).
And it’s fast because I use cached my embedding and I use pickle to retrieve them faster.
The slow part is using llm to answer the question :
This is an ingredient {ing}, and this ingredient is in the following list (with another name) {l_ing}. Find it.
Then I retrieve from the answer the name of the ingredients (fast).

1 Like