GPT-5 (non-reasoning) outputs U+202F (narrow no-break space) instead of normal spaces — breaks text rendering on macOS apps

The GPT-5 (non-reasoning) model has recently started to frequently insert U+202F — Narrow No-Break Space instead of the standard U+0020 space.

In my environment (and several other macOS applications like T3 Chat), U+202F renders as a very narrow gap—approximately one-fifth the width of a normal space—making the generated text visually compressed and extremely difficult to read.

Symptom Examples:

  • Observed in: Markdown table headers, headings, and at times, in regular prose.
  • Not limited to specific threads: Old threads that never produced these characters, now does.

Technical Details:

Field Value
Model GPT-5 (non-reasoning variant)
Token ID 35971 (per OpenAI tokenizer)
Unicode U+202F (Narrow No-Break Space)
OS macOS Sequoia 15.6.1
Hardware Apple M1 Max, 32 GB RAM
Interface T3 Chat / Other macOS UIs

Request:

Could the OpenAI team clarify if this frequent substitution of U+0020 with U+202F is intentional (e.g., related to tokenization or watermarking), and if not, please flag this for a normalization fix in the GPT-5 generation pipeline. A consistent use of U+0020 for general spacing is crucial for UI readability.

Thank you,
Patrik

1 Like

I am also finding this exact same issue within gpt-5-chat! It’s quite annoying, have worked around it by adding the following to my prompts:
”Do not use narrow no‑break spaces (U+202F)”

But it would be great to know if this is something I should be updating my renderer to accommodate for so text doesn’t look so compressed?

When did you notice this?

I wonder if OpenAI got these narrow spaces in their training data for an update by mistake or wether there’s something else going on :thinking:

I’m not sure but it is defo only with gpt-5 so it must just be how it was trained like you said. GPT-5 in general was such a big downgrade imo😬

Interesting article here seems to be the issue we’re having: New ChatGPT Models Seem to Leave Watermarks on Text

So it seems some reinforcement-learning training data got contaminated with U+202F characters. Good find, thanks!