GPT 4.1 Character Encoding Issues?

_j · June 3, 2025, 7:04pm

Those are byte representations that are not valid UTF-8 code point bytes, and representing them in strings or even in normal writing as hex or escape codes is certainly odd output behavior.

The problem with 8-bit code pages (not ASCII) is that the upper half can change depending on the platform language, changing an accented character into a block drawing character. Windows 95 might have used CP-1252 for English, but then for the Cyrillic or Nordic version, could substitute in another byte-to-display system like cp-850 or multiple ISO specs based on different languages, whatever. ASCII is only 7-bit (values below 0x7e), and the upper half 129-255 is a minefield, now used as an indicator of multi-byte UTF-8.

You can see your e4 => ä here.

So then, we must tell the AI what not to produce from its corpus training or issues in data processing.

# -*- coding: utf-8 -*-

Responses are always natural UTF-8, never using an escape sequence of hex or bytes, nor single-byte code pages. Your render environment and output strings fully-support multi-byte UTF-8 for world languages (while you prefer the ASCII character subset).

You can see if providing instructional clarity along those lines can reduce the need for output processing.

Topic		Replies	Views
Structured output with responses API returns tons of \n\n\n\n Bugs responses-endpoint , gpt-41	26	3094	March 16, 2026
Gpt-4-1106-preview messes up function call parameters encoding Bugs	103	22560	February 6, 2024
Model returning malformed characters in JSON response using API Bugs	8	805	January 5, 2026
Responses API returning \t\t\n\n \t\t\n \t\t\n \t\t\n Bugs responses	17	2117	December 1, 2025
The GPT-4-1106-preview model keeps generating "\\n\\n\\n\\n\\n\\n\\n\\n" for an hour when using functions API chatgpt , api	9	2759	December 31, 2023

GPT 4.1 Character Encoding Issues?

Related topics