The same prompt outputs correct JSON output when in “text” mode and incorrect JSON output when in “json_object” mode.
Here’s the system prompt:
Translate the provided content into German.
The content is provided as a JSON object containing an array "entries".
Each object in the array contains "id", "context" and "html".
Translate only the text within the "html" field, preserving all HTML tags and attributes.
Output as a JSON object containing an array "entries" of objects with the same format as the input.
Output one entry for each input entry, in the same order as the input.
Here’s the user prompt:
{"entries": [{"html": "How would one decide on an ionization technique for acetaminophen or \u03b2-cyclodextrin?"}]}
In text mode, the output is correct:
{
"entries": [
{
"html": "Wie würde man eine Ionisationstechnik für Acetaminophen oder \u03b2-Cyclodextrin auswählen?"
}
]
}
But in json_object mode, it’s wrong:
{
"entries": [
{
"html": "Wie würde man eine Ionisationstechnik für Acetaminophen oder \\\\u03b2-Cyclodextrin auswählen?"
}
]
}
See the quadruple backslashes that are incorrect.
This occurs (differently, but still incorrectly) in both gpt-4o and gpt-4-turbo.
It has a significant impact as we can’t constrain the output to json and need to remove the triple backticks in the output.
Temperature zero has the same output. Also, unfortunately, it’s not just a matter of sanitization, as the output itself is incorrect. It will literally render as “\u03b2-Cyclodextrin”, not “β-Cyclodextrin”
The characters are being escaped for the multiple nesting levels.
API response object, containing
messages.content object, containing
json object
The duplicated backslashes are duplicated again to preserve them.
Actually doing a “print” of the response content string, the bytes of \u03b2 are finally turned into the unicode:
{"response_to_user":"Here is the rendered unicode character for \\u03b2: β\n\nHere are some other similar Greek characters in their rendered unicode:\n- α (alpha) - \\u03b1\n- γ (gamma) - \\u03b3\n- δ (delta) - \\u03b4\n- ε (epsilon) - \\u03b5\n- θ (theta) - \\u03b8\n","response_topic":"Unicode characters"}
It is all about your code, which you don’t hint at.
Here’s a hint in hint format, mangled by the forum escaping.
{ "response_to_user": "To unpack and print the Unicode character \\u03b2 (which represents the Greek letter beta, \u03b2) from a JSON response in a programming language like Python, you can use thejson module to parse the JSON. Here\u2019s an example:\n\n```python\nimport json\n\n# Example JSON response containing the Unicode character\njson_response = '{\"letter\": \"\\u03b2\"}'\n\n# Parse the JSON response\ndata = json.loads(json_response)\n\n# Access the value and print it\nprint(data['letter']) # Output: \u03b2\n```\n\nIf you're using JavaScript, it would look like this:\n\n```javascript\n// Example JSON response\nconst jsonResponse = '{\"letter\": \"\\u03b2\"}';\n\n// Parse the JSON response\nconst data = JSON.parse(jsonResponse);\n\n// Access the value and print it\nconsole.log(data.letter); // Output: \u03b2\n```\n\nThis should correctly unpack the Unicode character and display it as intended.", "response_topic": "Unicode in JSON" }
The code I’m using is identical whether I specify json_object or text, so I don’t believe the issue is one of escaping. Also, for example, gpt-4-turbo outputs the following:
{
"entries": [
{
"html": "Wie würde man eine Ionisationstechnik für Acetaminophen oder \beta-Cyclodextrin auswählen?"
}
]
}
It may be that you want HTML, the AI is also just being dumb about HTML and overtrained on its own byte sequences.
See how it writes “response_string” as your name.
Tell Mr Chatbot that you want HTML numeric character references, like β in the HTML it writes, and see if that doesn’t avoid the whole escaping issue.
I still think it could be just a matter of unwrapping your JSONs correctly instead of casting them to other objects.