A tip when returning Multibyte Characters from your functions to GPT (function calling)

I’d like to share a tip about returning multibyte characters from functions to GPT during function calls.

In the official documentation, the example uses json.dumps(response) to serialize the response from your functions. By default, this method encodes multibyte characters into ASCII escape sequences, like \u6674\u308c.

However, GPT does not automatically decode these sequences and instead processes them as is. This approach is not optimal in terms of token usage. For instance, “晴れ” consumes only three tokens, but its encoded version “\u6674\u308c” uses up six tokens (you can try it here). Moreover, given that the majority of the training data isn’t in this encoded format, the performance might vary slightly.

To address this, I suggest using ensure_ascii=False with json.dumps() when returning responses to GPT. It would also be beneficial if the team at OpenAI could reflect this recommendation in the official documentation.

3 Likes