Gpt-4o hallucinates inventing IDs from provided list

bragma · February 10, 2025, 3:11pm

Hello, I have an issue with GPT-4o “inventing” ingredient IDs, even though I provide a list from which it should choose in the prompt. Let me explain:

I’m using the text generation API to create recipes, where ingredients must be selected only and exclusively from the list I provide. The output is a JSON array generated through “structured output,” which contains the list of chosen ingredients. Here’s an example:

SYSTEM PROMPT:

You are a chef who creates recipes by selecting some ingredients from the following list. For each user request, provide a recipe by taking the ingredient IDs and their names only from the list below:

[
    {
        "Id": 345,
        "Name": "Flour"
    },
    {
        "Id": 123,
        "Name": "Eggs"
    },
    {
        "Id": 15,
        "Name": "Salt"
    },
    ...
]

OUTPUT SCHEMA:

{
    ...
    "schema": {
        "type": "object",
        "properties": {
            "ingredients": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "ingredient_id": { "type": "integer" },
                        "ingredient_name": { "type": "string" }
                    },
                    "required": ["ingredient_id", "ingredient_name"],
                    "additionalProperties": false
                }
            }
        },
        "required": ["ingredients"],
        "additionalProperties": false
    },
    "strict": true
    ...
}

USER PROMPT:

“Create a breakfast recipe for this morning.”

OUTPUT:

[
    ...
    {
        "ingredient_id": 3,
        "ingredient_name": "Flour"
    }
    ...
]

The issue is that while the ingredient name seems to be correct, about 10% of the time the ingredient_id is randomly generated and does not match any ID in the provided list.

Right now, I verify the names and correct the IDs based on the initial list (matching by name). However, this is just a workaround, and I suspect that the names might also be invented at times.

My questions:

Is there a way to increase the probability that the ID matches exactly with the one from the list?
Are there other methods I can use to improve the result?

Thanks!

sps · February 10, 2025, 3:28pm

Why bother sending IDs to the model and incur costs?

Simply send the list of ingredients, get the answers in a json list. Then lookup ID for every ingredient in the list you receive.

I’d also recommend reducing temperature if you’re seeing hallucinations.

bragma · February 10, 2025, 3:38pm

Hi, thanks for answering. I’m not sure I understand how I can send the list without the IDs, without having the same problem with another matching property, ex. the ingredient name (I already tried and gpt seems to also make up some names).
The list of ingredients without the id is only a list of names, and in this case I have to enforce that the ingredients are chosen only from those.

I need to match the ingredients from my DB (where I have a longer list of properties associated to each ingredient).

I am currently providing the list as a JSON. I was wondering if using a CSV may help, or maybe returing the list from a function call, or using an id:object mapping instead…

I’ll try with lowering temperature, too.

Topic		Replies	Views
Assistant going through a database sometimes hallucinates IDs API assistants-api	3	458	February 4, 2024
The model is mixing up the input data API gpt-4 , api , hallucinations	1	167	July 14, 2024
Assistant (through API) makes up stuff API assistants-api	8	438	September 11, 2024
GPT-4o giving only one suggestion despite asking for multiple; GPT-4o-mini hallucinating when prompt gets detailed Prompting gpt-4 , hallucinations , gpt-4o-mini	2	180	October 24, 2025
How to encourage/force instruct models to choose among options Prompting	4	1068	July 30, 2021

Gpt-4o hallucinates inventing IDs from provided list

Related topics