I’m using gpt-4.1 to identify food items from photos. My prompt instructs the model to visually identify foods in an image and return ingredient names
The issue: When I send a photo of a white bowl containing pieces of chocolate and blueberries, the model consistently returns “blueberries” (correct) and “banana” (wrong — these are clearly chocolate pieces).
This isn’t random — it’s reproducible across multiple calls. The prompt emphasizes visual inference and says “Only identify food that is CLEARLY VISIBLE in the image.”
A few things I’m wondering:
-
Is this a known issue with
gpt-4.1and dark-colored foods? -
Would a different model (
gpt-4o, etc.) handle this better? -
Any prompt engineering tips to reduce food identification hallucinations?
-
You get what you pay for?
Thank you for taking a look and any suggestions!