GPT-4o returning malformed Unicode like \u0000e6 instead of æ — encoding bug?

tht.garuda · July 24, 2025, 11:16pm

Hi all,

I’m encountering a recurring encoding issue using GPT‑4o via the Chat API (no streaming). The issue appears when the response contains non-ASCII characters like æ, ø, å. Instead of valid Unicode escape sequences (like \u00e6), I often get corrupted ones such as:

"Jeg har modtaget tilstr\u0000e6kkelig information til ..."

Expected:

"Jeg har modtaget tilstrækkelig information til ..."

This occurs intermittently, even with identical prompts and schema input. It appears before any post-processing on my side — directly in chatResponse.Choices[0].Message.Content.

Example from debugger:

ValueKind = String : 
"{"JobAdCreateStatus":{"wascreated":true,"explanation":"Jeg har modtaget tilstr\u0000e6kkelig information til ..."

I’m using:

Model: gpt-4o
Endpoint: Chat completion
Tool use / function calling with JSON schema
SDK: Official .NET OpenAI SDK
Streaming: disabled
Encoding: UTF‑8 throughout

This behavior does not occur consistently, but often enough that it corrupts production content when characters like æ or ø are used.

Has anyone else seen similar corruption from GPT‑4o?

Any insights or official response would be very helpful

Thanks,
— Thomas

_j · July 25, 2025, 1:18pm

Yes. Another topic:

You’ll have to discuss the issue with the AI: that it should natively output unescaped Unicode UTF-8 language, even into structured output and code (providing the code encoding is Unicode, like Python 3), and also not high-ASCII code page bytes.

You can also put some example outputs in the system message, in the user’s language if known, or alternately some example chat turns (you can use "name":"example" on chat completions alongside role).

You can use logit_bias to demote certain tokens on Chat Completions, in this case, 7570 for “\u”.

Neither feature is provided with “Responses”.

You also have gpt-4o-2024-11-20 to try out.

This ultimately needs OpenAI to acknowledge and fix the models. One shouldn’t have to create an extensive fine-tuning to get the AI to properly write a world language.

Topic		Replies	Views
Wrong encoding for gpt-4o during API Chat completion Bugs	2	1584	May 15, 2024
Model returning malformed characters in JSON response using API Bugs	8	481	January 5, 2026
Support of unicode in gpt4-1106-preview Bugs gpt-4 , api	10	2496	November 15, 2024
Gpt-4-1106-preview is not generating utf-8 API gpt-4-turbo	8	9249	February 17, 2024
Strange unicode characters in response of ResponseAPI Bugs gpt-4	4	244	June 6, 2025

GPT-4o returning malformed Unicode like \u0000e6 instead of æ — encoding bug?

Related topics