[SERVER-SIDE ISSUE] High token cost for non-Latin characters in structured output descriptions

TL;DR

If you write the description of structured output parameters not in Latin, but, for example, in Cyrillic (Russian, Kazakh, other languages) or in Chinese, Japanese, and others, a lot of tokens are wasted because non-Latin characters are converted to ASCII when translating json to str. In other words, if I write “Имя пользователя - Том” (in English "Username - Tom") during normal response generation, it consumes 5 tokens, but if I create structured output with the parameter "username" and the description “Имя пользователя” and write “Том” in the prompt, it will consume more than 79 tokens, since json in structured output is sent to the OpenAI server and converted to str without ensure_ascii=False. This is very easy to fix, and it will be very useful for languages that do not use the Latin alphabet.

Describe the bug

When using Pydantic models with Cyrillic text (or other non-ASCII characters) in field descriptions for structured output with client.beta.chat.completions.parse(), the token count becomes significantly higher than expected due to Unicode escaping in JSON serialization.

The issue appears to stem from Python’s json.dumps() default behavior of using ensure_ascii=True , which converts non-ASCII characters to Unicode escape sequences. This happens during HTTP request serialization when the Pydantic schema is converted to JSON format for the API request. For example:

import json
from pydantic import BaseModel, Field
from openai import OpenAI

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")

user_prompt = "My name John"
messages = [{"role": "user", "content": user_prompt}]

class Schema(BaseModel):
    user_name: str = Field(description="Имя пользователя, если он его называл. Если нет, то оставь пустую строку")

schema = Schema.model_json_schema()

print("--- SCHEMA (ensure_ascii=False) ---")
str_schema = json.dumps(schema, ensure_ascii=False)
num_tokens = len(enc.encode(str_schema))
print(str_schema)
print(f"Num tokens: {num_tokens}")

print("\n--- SCHEMA (ensure_ascii=True) ---")
str_schema = json.dumps(schema, ensure_ascii=True)
num_tokens = len(enc.encode(str_schema))
print(str_schema)
print(f"Num tokens: {num_tokens}")

print("\n--- USER PROMPT ---")
num_tokens = len(enc.encode(user_prompt))
print(user_prompt)
print(f"Num tokens: {num_tokens}")

print("\n--- MESSAGES ---")
str_messages = str(messages)
num_tokens = len(enc.encode(str_messages))
print(str_messages)
print(f"Num tokens: {num_tokens}")

with OpenAI(api_key=api_key) as client:
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=messages,
        response_format=Schema,
    )

print("\n--- Prompt tokens (from response) ---")
print(f"Num tokens: {response.usage.prompt_tokens}")

Result:

--- SCHEMA (ensure_ascii=False) ---
{"properties": {"user_name": {"description": "Имя пользователя, если он его называл. Если нет, то оставь пустую строку", "title": "User Name", "type": "string"}}, "required": ["user_name"], "title": "Schema", "type": "object"}
Num tokens: 65

--- SCHEMA (ensure_ascii=True) ---
{"properties": {"user_name": {"description": "\u0418\u043c\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f, \u0435\u0441\u043b\u0438 \u043e\u043d \u0435\u0433\u043e \u043d\u0430\u0437\u044b\u0432\u0430\u043b. \u0415\u0441\u043b\u0438 \u043d\u0435\u0442, \u0442\u043e \u043e\u0441\u0442\u0430\u0432\u044c \u043f\u0443\u0441\u0442\u0443\u044e \u0441\u0442\u0440\u043e\u043a\u0443", "title": "User Name", "type": "string"}}, "required": ["user_name"], "title": "Schema", "type": "object"}
Num tokens: 233

--- USER PROMPT ---
My name John
Num tokens: 3

--- MESSAGES ---
[{'role': 'user', 'content': 'My name John'}]
Num tokens: 16

--- Prompt tokens (from response) ---
Num tokens: 240
  • Schema with Cyrillic description: 233 tokens (with Unicode escapes)
  • Same schema without escaping would be: 65 tokens
  • 3.6x token overhead for non-ASCII text in schema descriptions

Recommendation: Update the server-side JSON serialization for structured output (function/tool calling, JSON mode) to use ensure_ascii=False . This will prevent non-Latin characters (like Cyrillic) from being escaped into ASCII sequences (\uXXXX ), preserving their native format. The result will be a significant reduction in token usage for developers globally.

3 Likes

Here’s the smoking gun. The attached screenshot shows two raw cURL calls to the API. One with proper UTF-8, one with escaped characters. Both cost 240 tokens.

Even when sending a perfectly optimized UTF-8 request (left), the token cost is identical to the inefficient, escaped version (right).

The problem isn’t the client, it’s how the server calculates the cost for response_format.

Is this behavior intentional, and when can we expect a fix for this?