Bug: Unicode should not be escaped in multiple tool call results

I am writing to report an issue I have encountered with the API when conducting tool calls that involve at least two functions. Specifically, the results of these functions include Unicode characters (e.g., Chinese characters), which are being escaped into sequences like \u4f60\u597d before being sent back to the LLM. This behavior leads to two significant problems:

  1. Excessive Token Usage: For instance, the Chinese phrase “你好” typically counts as one token. However, after escaping, it transforms into seven tokens. This unexpectedly increases the token usage by an order of magnitude, particularly affecting use cases such as web scraping and PDF reading tools. Consequently, this issue has unknowingly led to substantial additional costs for me.

  2. Degradation of LLM Output Quality: The LLM struggles to accurately interpret the content when presented with escaped Unicode sequences because it cannot reliably map each Unicode character to its escaped sequence. This results in a significant drop in the quality of the LLM’s output.

Below is a Python script to reproduce this bug:

import os
from openai import OpenAI

data = [
    {
        "role": "user",
        "content": "Test",
    },
    {
        "role": "assistant",
        "content": "",
        "tool_calls": [
            {
                "id": "call_demo1",
                "type": "function",
                "function": {
                    "name": "open_url",
                    "arguments": '{"url": "https://example1.com/"}',
                },
            },
            {
                "id": "call_demo2",
                "type": "function",
                "function": {
                    "name": "open_url",
                    "arguments": '{"url": "https://example2.com/"}',
                },
            },
        ],
    },
    {
        "role": "tool",
        "name": "open_url",
        "content": 'result: 这是一句中文示例',
        "tool_call_id": "call_demo1",
    },
    {
        "role": "tool",
        "name": "open_url",
        "content": 'result: 这是一句中文示例你好',
        "tool_call_id": "call_demo2",
    },
    {
        "role": "user",
        "content": "Repeat the raw results above in a Python list",
    },
]

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=data,
    model="gpt-4o",
)

print(chat_completion)

data[3]['content'] = data[3]['content'][:-2] # removing "你好"

chat_completion = client.chat.completions.create(
    messages=data,
    model="gpt-4o",
)

print(chat_completion)

And here is the output from the script:

ChatCompletion(id='chatcmpl-9ZRwJ54sIO4vMUspyyW83ECnJhCUM', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='```python\n["result: \\u8fd9\\u662f\\u4e00\\u53e5\\u4e2d\\u6587\\u793a\\u4f8b", "result: \\u8fd9\\u662f\\u4e00\\u53e5\\u4e2d\\u6587\\u793a\\u4f8b\\u4f60\\u597d"]\n```', role='assistant', function_call=None, tool_calls=None))], created=1718235907, model='gpt-4o-2024-05-13', object='chat.completion', system_fingerprint='fp_319be4768e', usage=CompletionUsage(completion_tokens=83, prompt_tokens=164, total_tokens=247))
ChatCompletion(id='chatcmpl-9ZRwL4Q2UaRnyRH3ZLJrDoaUhNJz6', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='```python\nresults = [\n    "result: \\u8fd9\\u662f\\u4e00\\u53e5\\u4e2d\\u6587\\u793a\\u4f8b",\n    "result: \\u8fd9\\u662f\\u4e00\\u53e5\\u4e2d\\u6587\\u793a\\u4f8b"\n]\n```', role='assistant', function_call=None, tool_calls=None))], created=1718235909, model='gpt-4o-2024-05-13', object='chat.completion', system_fingerprint='fp_319be4768e', usage=CompletionUsage(completion_tokens=82, prompt_tokens=157, total_tokens=239))

As demonstrated, adding the phrase “你好” increases the input token count by seven. Moreover, the LLM’s output indicates that it perceives the tool call results in their escaped form. Notably, this issue is also reproducible on older models.

This problem shares similarities with the one here, although the cited issue pertains to function call parameters in the LLM’s output, whereas the current problem occurs during the process of feeding function call results back to the LLM.

Proposed Solution: Similar to the handling of a single tool call, the raw results of multiple tool calls should be directly passed to the LLM without escaping Unicode characters. I suspect this issue arises during the JSON encoding of multiple results. If so, please ensure that the original Unicode characters are preserved during the JSON encoding process.