Gpt-5.2 is not following the specified response format, and includes weird tokens. These tokens then derail our gpt-5.2 evaluator

where: at completions API (hosted in EU)
time: January 23rd
model: “gpt-5.2”,
temperature: 0,
stream: true,
service_tier: “priority”,
reasoning_effort: “none”,
verbosity: “low”
system_message:
“”"
…You will interact … with the MediVoice API.
\n\n## MediVoice API\n``` * function msg(message: string): void: Talk to the caller using the TTS system.\n * function set(facts: Record<string, any>): any: Set facts…
“”"

Unexpected behavior: The model responds with “set({“phone”:“015144389161”}) 乐盈”, which does not follow the specified MediVoice API (which would be msg("乐盈") instead), switches the language, and gives a semantically nonsensical answer.

Furthermore, if we use our rating prompt (also through the API using gpt-5.2), we get the response
“”"
mistake: “#outputCooruptions#”,
reasoning: “After set({\"existingPatient\":true}) the assistant output contains an … \u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b\u000b…
“””
with “infinitely” many “\u000b"

This bad behavior never occurs when we use gpt-4.1.