Hello, I have an issue with GPT-4o “inventing” ingredient IDs, even though I provide a list from which it should choose in the prompt. Let me explain:
I’m using the text generation API to create recipes, where ingredients must be selected only and exclusively from the list I provide. The output is a JSON array generated through “structured output,” which contains the list of chosen ingredients. Here’s an example:
SYSTEM PROMPT:
You are a chef who creates recipes by selecting some ingredients from the following list. For each user request, provide a recipe by taking the ingredient IDs and their names only from the list below:
[
{
"Id": 345,
"Name": "Flour"
},
{
"Id": 123,
"Name": "Eggs"
},
{
"Id": 15,
"Name": "Salt"
},
...
]
OUTPUT SCHEMA:
{
...
"schema": {
"type": "object",
"properties": {
"ingredients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"ingredient_id": { "type": "integer" },
"ingredient_name": { "type": "string" }
},
"required": ["ingredient_id", "ingredient_name"],
"additionalProperties": false
}
}
},
"required": ["ingredients"],
"additionalProperties": false
},
"strict": true
...
}
USER PROMPT:
“Create a breakfast recipe for this morning.”
OUTPUT:
[
...
{
"ingredient_id": 3,
"ingredient_name": "Flour"
}
...
]
The issue is that while the ingredient name seems to be correct, about 10% of the time the ingredient_id is randomly generated and does not match any ID in the provided list.
Right now, I verify the names and correct the IDs based on the initial list (matching by name). However, this is just a workaround, and I suspect that the names might also be invented at times.
My questions:
- Is there a way to increase the probability that the ID matches exactly with the one from the list?
- Are there other methods I can use to improve the result?
Thanks!