Issue with content filtering on recipes with JSON mode

I’m building a recipe app and I’m running into an issue with false positives from the content filter. I’m giving gpt4 the text of a recipe and asking it to give it back to me in a standardized JSON object. The model stops early for one particular recipe with content_filter as the finish reason. Specifically it always stops when it reaches the word “oven”. Here’s what the output looks like every time:

{
    "Baked Risotto With Greens and Peas": {
        "servings": 4,
        "cook_time": "30 minutes",
        "ingredients": [
            {"item": "extra-virgin olive oil", "amount": 2, "unit": "tablespoons"},
            {"item": "yellow onion", "amount": 0.5, "unit": "cup"},
            {"item": "garlic clove", "amount": 1, "unit": "small"},
            {"item": "Arborio rice", "amount": 1, "unit": "cup"},
            {"item": "Kosher salt", "amount": null, "unit": null},
            {"item": "black pepper", "amount": null, "unit": null},
            {"item": "green or lacinato kale", "amount": 4, "unit": "ounces"},
            {"item": "low-sodium chicken broth", "amount": 3.5, "unit": "cups"},
            {"item": "baby spinach", "amount": 4, "unit": "ounces"},
            {"item": "frozen peas", "amount": 1, "unit": "cup"},
            {"item": "grated Parmesan", "amount": 0.75, "unit": "cup"},
            {"item": "unsalted butter", "amount": 3, "unit": "tablespoons"},
            {"item": "lemon juice", "amount": 1, "unit": "tablespoon"}
        ],
        "instructions": [
            "Preheat the

Here’s my system prompt:

You will be given text from a website containing a recipe.
Your task is to extract the recipe from the provided text and transform it into a standardized JSON format.
Make sure to include all information from the original recipe. Do not summarize the original instructions.

Here's an example of the expected structure of your output:

{
    "Classic Chocolate Chip Cookies": {
        "servings": 24,
        "cook_time": "25 minutes",
        "ingredients": [
            {"item": "all-purpose flour", "amount": 2.25, "unit": "cups"},
            {"item": "baking soda", "amount": 1, "unit": "teaspoon"},
            // ... (more ingredients)
        ],
        "instructions": [
            "Preheat oven to 375°F (190°C).",
            "In a small bowl, combine flour, baking soda, and salt. Set aside.",
            // ... (more instructions)
        ],
        "notes": [
            "For best results, use room temperature ingredients.",
            // ... (more notes)
        ]
    }
}

Your output should be a JSON object with the above structure.

Any ideas for how to fix this?

1 Like

I also face the same issue. When I add text containing the word enlargment in any context (e.g. font enlargment), I get timed out.

The above may be a clue. The content filter will stop on output patterns like song lyrics quickly, for example, perhaps the runs being evaluated by this cutoff of AI output extending beyond what you see and being replaced with a finish reason.

This may not be a “copyright database” per se, but any time the AI gets to reproducing sequences exactly from training corpus.

I couldn’t organically make a content filter stop, even with your misspelling. Do you have a particular sequence that the AI would be trying to write, that you could say “repeat this text back to me verbatim as a test”.


In this message I’ve previously provided faulty examples from my memory. I’ll try to reproduce the issue, but it might have been wrong API timeouts for openai-python at my side.

Thanks for the response. I removed any reference to a website in case it was an issue with web scraping but that had no effect. It looks like you’re right that the model is objecting to reproducing output from its training corpus. The recipe in question is likely in the training set.

If I ask it to translate the recipe into Spanish then it gives me the full output just fine. Just for fun I asked it to translate it into Spanish and then back into English but that triggered the content filter again. Looks like I won’t be able to use these models for this purpose.

Thanks for your help!