Incorrect Cyrillic rendering in Codex Agent on Windows due to PowerShell 5.1 default ANSI encoding

Description:
When using Codex Agent on Windows (VS Code / Cursor / Windsurf), files containing Cyrillic comments are displayed incorrectly (garbled characters). The agent appears to invoke powershell.exe by default, which uses ANSI instead of UTF-8.

Expected behavior:

  • Files with UTF-8 encoding (with or without BOM) should display correctly.

  • Ideally, the agent should:

    • Use PowerShell 7 (pwsh) by default (UTF-8 is the default there), or

    • Explicitly pass -Encoding UTF8 when reading/writing files, or

    • Provide a setting to override the shell used for Agent actions.

Steps to reproduce:

  1. Open a UTF-8 encoded JavaScript/TypeScript file with Cyrillic comments in VS Code on Windows.

  2. Run Codex Agent and ask it to show the first 10 lines.

  3. Cyrillic characters are replaced with question marks / invalid symbols.

Notes:

  • This does not happen in the built-in chat (Cursor/Windsurf) because text is read directly from the editor buffer.

  • Only the Agent (when reading/writing via PowerShell) shows corrupted characters.

  • Using Node.js or pwsh with -Encoding UTF8 works correctly.

Environment:

  • Windows 11 (latest build, exact version unknown but fully up to date)

  • VS Code 1.103.2

  • Codex extension 0.4.0

  • Node.js 22.17

  • PowerShell 7.5.2 installed, but Agent still defaults to legacy Windows PowerShell 5.1

  • Hardware: standard desktop PC

1 Like

Same problem, all my files get destroyed with incorrect encoding usage of codex for local edit on Windows

1 Like

Same problem here. It seems that it happens not right off the bat, but after a while.

1 Like

I ran into the same issue (Polish characters getting replaced with ?). The root cause is the same: Codex Agent on Windows uses PowerShell 5.1 (ANSI console encoding), so non‑ASCII characters get corrupted before they reach Python or get written back to files.

What fixed it for me was forcing UTF‑8 in the PowerShell profile + Python:

chcp 65001 > $null
$utf8NoBom = [System.Text.UTF8Encoding]::new($false)
[Console]::InputEncoding  = $utf8NoBom
[Console]::OutputEncoding = $utf8NoBom
$OutputEncoding = $utf8NoBom

$PSDefaultParameterValues[“Out-File:Encoding”] = “utf8”
$PSDefaultParameterValues[“Set-Content:Encoding”] = “utf8”
$PSDefaultParameterValues[“Add-Content:Encoding”] = “utf8”
$PSDefaultParameterValues[“Get-Content:Encoding”] = “utf8”

$env:PYTHONUTF8 = “1”
$env:PYTHONIOENCODING = “utf-8”

After restarting PowerShell, inline scripts like this polish characters test:
@’
print(“zаżółć gęślą jaźń”)
'@ | python -X utf8 -
print correctly, and files stop getting mangled.

Long‑term fix: Codex Agent should default to pwsh (PowerShell 7) or allow a shell override, or always force UTF‑8 for reads/writes.

1 Like

Not sure if this will help. I also had a UTF-8 problem on Windows but may be different.

Could you try disabling the feature and see if that resolves your problem? Add the following to your config.toml file.

[features]
powershell_utf8 = false
1 Like

The simple fact is that the AI model produces bytes of UTF-8 Unicode code points. It does not produce within an 8-bit code page you select (although that’s a common training fault OpenAI has done, to ingest unsanitized world language bytes).

Enabling UTF-8 support end-to-end is the solution. Powershell is not going to do this translation work for you from Unicode code point mapped to a 255-value code page glyph in a language set with switching representations of the bit 8 characters (where bit 8 is an “on” for multi-byte UTF-8), and Codex software obviously can’t know whether you are going to see block characters or Shift_JIS by a single option:

2 Likes