Codex VSCode - Major issue - LLM Suddendly can't handle encoding

Deex · October 16, 2025, 11:30pm

Greetings,

Since around a day ago, I’ve noticed severe bugs when using Codex High Local in VSCode. The model appears to have lost its understanding of encoding rules. Nearly every second task now results in corrupted code across several different projects. After some investigation, I was able to identify what’s happening.

The model is now using methods for reading and writing files that corrupt the encoding and destroy code integrity. This issue occurs not only with PowerShell but also with Python, often leading to completely broken files.

Below is a technical summary of the problem:

Bug Summary: Encoding Corruption When Using PowerShell for File Edits

When modifying project files using PowerShell commands such as Get-Content, Set-Content, -replace, or manual byte conversions, files may become encoding-corrupted.

Technical Cause

PowerShell’s default file encoding on Windows is CP1252 (ANSI), not UTF-8.
When a script reads or writes UTF-8 files (for example, PHP files containing <?php, special characters, or umlauts) using these commands without explicitly specifying UTF-8 encoding, PowerShell silently converts the text to CP1252 and back. This results in invalid byte sequences and corrupted characters.

Example Symptoms

UTF-8 files with <?php lose the BOM or break PHP syntax.
Umlauts and special characters (ä, ö, ü, ß, etc.) appear as garbled text.
Git diffs show massive binary changes even for minor edits.

This also sometimes happens even when I explicitly instruct the model (within the agent system) not to use those unsafe methods — it still does, which suggests something deeper is wrong, possibly in the system prompts or instruction handling.

Escalation of the Issue

Furthermore, the situation becomes even worse when you instruct the model to fix the corrupted files. It does not seem to understand what actually happened and instead continues generating additional scripts that attempt to “convert” parts of the code again. This leads to even more severe corruption, eventually resulting in completely broken files that contain invalid or binary-like content.

I tested this behavior across multiple projects in different programming languages, and the issue appears consistently everywhere.

Severity

This is a critical issue, as it can occur silently, file by file, without any visible warnings. The model gradually destroys codebases until numerous files are corrupted and the project starts producing errors.
If you then ask the model to fix the problem without explicitly explaining the root cause (encoding corruption), it continues to rewrite files incorrectly and can and will do irreversibly damage to large portions of the codebase.

Deex · October 16, 2025, 11:47pm

@All

Please update the agent instruction to: (this will not fully avoid this). And seek your codebase for this error (multiple times again) to find if it has already been corrupted.

**Encoding Safety:** Do **not** use PowerShell `Get-Content`/`Set-Content`, `-replace`, or ad-hoc byte conversion for editing project files. They default to CP1252 and corrupt UTF-8 files (e.g., `<?php`, umlauts).

Use `apply_patch` or Python (`python - <<‘PY’ … PY`) with explicit UTF‑8 read/write for all file edits.

gusdeoliveira · November 25, 2025, 5:35am

Unfortunately, this is still an issue with GPT 5.1, I created a new project and suddenly, my text has the wrong encoding; special characters are only displayed as diamonds with question marks.

Currently using Codex on CLI along with manual edits on Visual Studio.

The agent instruction workaround appears to be working so far. Thank you!

gusdeoliveira · December 18, 2025, 2:54am

Deex:

**Encoding Safety:** Do **not** use PowerShell `Get-Content`/`Set-Content`, `-replace`, or ad-hoc byte conversion for editing project files. They default to CP1252 and corrupt UTF-8 files (e.g., `<?php`, umlauts).

Use `apply_patch` or Python (`python - <<‘PY’ … PY`) with explicit UTF‑8 read/write for all file edits.

Still happening with gpt-5.2 btw

Topic		Replies	Views
Bug Report - Non-English Characters Corruption When Editing Code in GPT Codex Bugs codex	0	215	September 6, 2025
Incorrect Cyrillic rendering in Codex Agent on Windows due to PowerShell 5.1 default ANSI encoding Codex codex , codex-cli	2	366	September 16, 2025
GPT-5 Codex in VSCode: File overwritten into empty bytes after Set-Content replacement — anyone else? Bugs vscode , gpt-5-codex	1	146	October 25, 2025
Hello.I need to report some serious issues to you! Codex CLI chatgpt , api	2	89	January 26, 2026
Introducing GPT-5.1-Codex-Max: Enhanced reasoning and long-horizon workflows Codex announcement , codex	9	1497	November 27, 2025