Gpt-4o hallucinates inventing IDs from provided list

Hello, I have an issue with GPT-4o “inventing” ingredient IDs, even though I provide a list from which it should choose in the prompt. Let me explain:

I’m using the text generation API to create recipes, where ingredients must be selected only and exclusively from the list I provide. The output is a JSON array generated through “structured output,” which contains the list of chosen ingredients. Here’s an example:

SYSTEM PROMPT:

You are a chef who creates recipes by selecting some ingredients from the following list. For each user request, provide a recipe by taking the ingredient IDs and their names only from the list below:

[
    {
        "Id": 345,
        "Name": "Flour"
    },
    {
        "Id": 123,
        "Name": "Eggs"
    },
    {
        "Id": 15,
        "Name": "Salt"
    },
    ...
]

OUTPUT SCHEMA:

{
    ...
    "schema": {
        "type": "object",
        "properties": {
            "ingredients": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "ingredient_id": { "type": "integer" },
                        "ingredient_name": { "type": "string" }
                    },
                    "required": ["ingredient_id", "ingredient_name"],
                    "additionalProperties": false
                }
            }
        },
        "required": ["ingredients"],
        "additionalProperties": false
    },
    "strict": true
    ...
}

USER PROMPT:

“Create a breakfast recipe for this morning.”

OUTPUT:

[
    ...
    {
        "ingredient_id": 3,
        "ingredient_name": "Flour"
    }
    ...
]

The issue is that while the ingredient name seems to be correct, about 10% of the time the ingredient_id is randomly generated and does not match any ID in the provided list.

Right now, I verify the names and correct the IDs based on the initial list (matching by name). However, this is just a workaround, and I suspect that the names might also be invented at times.

My questions:

  • Is there a way to increase the probability that the ID matches exactly with the one from the list?
  • Are there other methods I can use to improve the result?

Thanks!

Why bother sending IDs to the model and incur costs?

Simply send the list of ingredients, get the answers in a json list. Then lookup ID for every ingredient in the list you receive.

I’d also recommend reducing temperature if you’re seeing hallucinations.

Hi, thanks for answering. I’m not sure I understand how I can send the list without the IDs, without having the same problem with another matching property, ex. the ingredient name (I already tried and gpt seems to also make up some names).
The list of ingredients without the id is only a list of names, and in this case I have to enforce that the ingredients are chosen only from those.

I need to match the ingredients from my DB (where I have a longer list of properties associated to each ingredient).

I am currently providing the list as a JSON. I was wondering if using a CSV may help, or maybe returing the list from a function call, or using an id:object mapping instead…

I’ll try with lowering temperature, too.