Improperly Escaped Quotes in Returned JSON Values

I’m working with a function to identify the inherent polarity of an article. When the function returns a JSON string that contains quoted phrases within its values, it results in an invalid JSON string because the internal quotes aren’t properly escaped.

For instance, consider this example returned value:

"{\n \"polarity\": \"good\",\n \"explanation\": \"This article is considered \"good\" because ...\"\n}"

When json.loads() is applied this string raises JSONDecodeError. The internal quotes within the “explanation” value are causing the JSON to be invalid.

These internal quotes should be double escaped for the validity of the JSON string.

Below is the description of the function:

functions = [{
    "name": "find_polarity",
    "description": "Identify if the news is good, bad or neutral and why.",
    "parameters": {
        "type": "object",
        "properties": {
            "polarity": {
                "type": "string",
                "description": "The inherent nature of the news covered in the article.",
                "enum": ["good", "neutral", "bad"]
            },
            "explanation": {
                "type": "string",
                "description": "The explanation for classifying the news as good, bad, or neutral "
                               "based on the inherent nature of the event. "
                               "Make sure the return value is valid JSON."
            }
        },
        "required": ["polarity", "explanation"],
    }
}]

I added the “Make sure the return value is valid JSON.” to the description of the “explanation” but it didn’t help a lot.

I presume there is no sure way to assume the returned JSON is valid.

How should I handle such quotes? It is not trivial to identify the internal quotes and double escape them while leaving the outer quotes untouched. How would you approach this? Any help is greatly appreciated.

1 Like

I’ve run into similar problems trying to create escapes for SQL. I think it’s that there’s a lot more data for the tokens w/out the quotes, so it usually forgets them or uses them occasionally.

Might be best to escape everything on your own after you get it back - or at least check that it’s valid. Read the “field” and apply the escaping then put it back in the JSON.

2 Likes

Thanks for the reply.

If I escape everything in the reply string, the quotes enclosing the dict keys and values would be double escaped too, still making the string invalid JSON.

I can check and in case the returned value is invalid, try to read the field to escape quotes, but in that case, a standard JSON method like json.loads() won’t be usable and I have to apply so regex, etc. to fetch to field value to apply the escaping, right?

1 Like

Unless you can tinker (change) the settings (temp / top_p) and maybe even the model (best if you can afford it) to get consistent results. I’d always check it before you try to load, though, and have some sort of fallback in place.

1 Like

I wrote a package to solve this kind of issues. You can check fuzy-jon on Pypi