Issue with content filtering on recipes with JSON mode

noelsusman · April 29, 2024, 9:02pm

I’m building a recipe app and I’m running into an issue with false positives from the content filter. I’m giving gpt4 the text of a recipe and asking it to give it back to me in a standardized JSON object. The model stops early for one particular recipe with content_filter as the finish reason. Specifically it always stops when it reaches the word “oven”. Here’s what the output looks like every time:

{
    "Baked Risotto With Greens and Peas": {
        "servings": 4,
        "cook_time": "30 minutes",
        "ingredients": [
            {"item": "extra-virgin olive oil", "amount": 2, "unit": "tablespoons"},
            {"item": "yellow onion", "amount": 0.5, "unit": "cup"},
            {"item": "garlic clove", "amount": 1, "unit": "small"},
            {"item": "Arborio rice", "amount": 1, "unit": "cup"},
            {"item": "Kosher salt", "amount": null, "unit": null},
            {"item": "black pepper", "amount": null, "unit": null},
            {"item": "green or lacinato kale", "amount": 4, "unit": "ounces"},
            {"item": "low-sodium chicken broth", "amount": 3.5, "unit": "cups"},
            {"item": "baby spinach", "amount": 4, "unit": "ounces"},
            {"item": "frozen peas", "amount": 1, "unit": "cup"},
            {"item": "grated Parmesan", "amount": 0.75, "unit": "cup"},
            {"item": "unsalted butter", "amount": 3, "unit": "tablespoons"},
            {"item": "lemon juice", "amount": 1, "unit": "tablespoon"}
        ],
        "instructions": [
            "Preheat the

Here’s my system prompt:

You will be given text from a website containing a recipe.
Your task is to extract the recipe from the provided text and transform it into a standardized JSON format.
Make sure to include all information from the original recipe. Do not summarize the original instructions.

Here's an example of the expected structure of your output:

{
    "Classic Chocolate Chip Cookies": {
        "servings": 24,
        "cook_time": "25 minutes",
        "ingredients": [
            {"item": "all-purpose flour", "amount": 2.25, "unit": "cups"},
            {"item": "baking soda", "amount": 1, "unit": "teaspoon"},
            // ... (more ingredients)
        ],
        "instructions": [
            "Preheat oven to 375°F (190°C).",
            "In a small bowl, combine flour, baking soda, and salt. Set aside.",
            // ... (more instructions)
        ],
        "notes": [
            "For best results, use room temperature ingredients.",
            // ... (more notes)
        ]
    }
}

Your output should be a JSON object with the above structure.

Any ideas for how to fix this?

denis.gordeev · April 29, 2024, 9:11pm

I also face the same issue. When I add text containing the word enlargment in any context (e.g. font enlargment), I get timed out.

_j · April 30, 2024, 2:39am

The above may be a clue. The content filter will stop on output patterns like song lyrics quickly, for example, perhaps the runs being evaluated by this cutoff of AI output extending beyond what you see and being replaced with a finish reason.

This may not be a “copyright database” per se, but any time the AI gets to reproducing sequences exactly from training corpus.

I couldn’t organically make a content filter stop, even with your misspelling. Do you have a particular sequence that the AI would be trying to write, that you could say “repeat this text back to me verbatim as a test”.

denis.gordeev · April 30, 2024, 9:15am

In this message I’ve previously provided faulty examples from my memory. I’ll try to reproduce the issue, but it might have been wrong API timeouts for openai-python at my side.

noelsusman · April 30, 2024, 12:55pm

Thanks for the response. I removed any reference to a website in case it was an issue with web scraping but that had no effect. It looks like you’re right that the model is objecting to reproducing output from its training corpus. The recipe in question is likely in the training set.

If I ask it to translate the recipe into Spanish then it gives me the full output just fine. Just for fun I asked it to translate it into Spanish and then back into English but that triggered the content filter again. Looks like I won’t be able to use these models for this purpose.

Thanks for your help!

Topic		Replies	Views
Issue with fine tuned model Bugs fine-tuning	2	210	June 25, 2024
ChatGPT stops generating mid-translation API openai-documentation	6	1992	April 11, 2024
API to gpt-3.5-turbo-16k getting stuck in a loop until it reaches max tokens API	7	1270	November 21, 2023
Issues with Multilingual Content Extraction from Knowledge Base in GPT-4 API API gpt-4 , api	5	621	January 24, 2024
Error in Image Generation API - Rejection by Safety System API api , image-generation	3	1955	September 21, 2023

Issue with content filtering on recipes with JSON mode

Related topics