Hi all,
I assume the problem is on my end somehow, but I currently don’t understand why / where:
For generating business document drafts, I’m using OpenAI Chat Completions API with structured output (using the Python API library and Pydantic) successfully since months with two different versions of the GPT 4o model.
Experimentally I switched to GPT 4.1 (full model) after it was announced. The actual content of the generated document seems to be noticeably better than before, possibly due to the better instruction following capabilities and better understanding of what NOT to write.
However, what I never experienced with GPT 4o before: GPT 4.1 causes lots of messed up output encoding, returning garbled characters instead of proper UTF-8 encoded characters whenever non ASCII characters occur in the text. It does not happen always (i.e. not for all generated documents), but much too often, and I also was’t able to solve it with specific prompting (to make it pay attention to encoding).
It mainly happens in a step after I feed back a generated first document draft for an additional review round with slightly lower temperature (0.8) (to make it cross check that it properly incorporated the instructions into the generated document). While the version I got in the previous step still seems to have proper special characters, the reviewed version I get back after this step has garbled ones.
So I suspect that I possibly mess up the information I provide for review, but I’m really just taking the text I got from the parsed JSON and feed it back togehter with a new user message. It’s also exactly the same I did with GPT 4o all the time and which never caused any issues until I switched to 4.1…
For debugging / monitoring I’m logging the information I’m about to send back, and it looks ok - though I’m aware that it’s often difficult to identify encoding issues reliably just by looking at logging output. But Python uses unicode strings internally, my console is UTF-8, and the json.dumps
output looks fine.
In my frustration I also upgraded the OpenAI API Python lib to the current version, but no changes.
Did anyone else notice changed behaviour in regard to character encodings?
Unfortunately, I cannot provide real examples here.