When using tool calling with an Enum type, all properties and enum values written in Cyrillic produce corrupted symbols like \u0000... or \x00... in the generated object.
The issue does not occur when Latin characters are used instead. Cyrillic strings should be serialized and deserialized correctly without corruption. Instead of readable Cyrillic text, the resulting object contains garbled sequences with null bytes.
Steps to reproduce
- Define an
Enumwhere member names or values use Cyrillic letters. - Use this Enum within a tool calling function.
- Inspect the generated or serialized result.
Possible cause
Likely an encoding mismatch during serialization/deserialization (utf-8 vs utf-16 / utf-32) or improper string handling during JSON generation.
Suggestion
Ensure consistent UTF-8 encoding across the pipeline and, if applicable, use:
json.dumps(obj, ensure_ascii=False)