Description:
When using Codex Agent on Windows (VS Code / Cursor / Windsurf), files containing Cyrillic comments are displayed incorrectly (garbled characters). The agent appears to invoke powershell.exe by default, which uses ANSI instead of UTF-8.
Expected behavior:
Files with UTF-8 encoding (with or without BOM) should display correctly.
Ideally, the agent should:
Use PowerShell 7 (pwsh) by default (UTF-8 is the default there), or
Explicitly pass -Encoding UTF8 when reading/writing files, or
Provide a setting to override the shell used for Agent actions.
Steps to reproduce:
Open a UTF-8 encoded JavaScript/TypeScript file with Cyrillic comments in VS Code on Windows.
Run Codex Agent and ask it to show the first 10 lines.
Cyrillic characters are replaced with question marks / invalid symbols.
Notes:
This does not happen in the built-in chat (Cursor/Windsurf) because text is read directly from the editor buffer.
Only the Agent (when reading/writing via PowerShell) shows corrupted characters.
Using Node.js or pwsh with -Encoding UTF8 works correctly.
Environment:
Windows 11 (latest build, exact version unknown but fully up to date)
VS Code 1.103.2
Codex extension 0.4.0
Node.js 22.17
PowerShell 7.5.2 installed, but Agent still defaults to legacy Windows PowerShell 5.1
I ran into the same issue (Polish characters getting replaced with ?). The root cause is the same: Codex Agent on Windows uses PowerShell 5.1 (ANSI console encoding), so non‑ASCII characters get corrupted before they reach Python or get written back to files.
What fixed it for me was forcing UTF‑8 in the PowerShell profile + Python:
The simple fact is that the AI model produces bytes of UTF-8 Unicode code points. It does not produce within an 8-bit code page you select (although that’s a common training fault OpenAI has done, to ingest unsanitized world language bytes).
Enabling UTF-8 support end-to-end is the solution. Powershell is not going to do this translation work for you from Unicode code point mapped to a 255-value code page glyph in a language set with switching representations of the bit 8 characters (where bit 8 is an “on” for multi-byte UTF-8), and Codex software obviously can’t know whether you are going to see block characters or Shift_JIS by a single option: