I’d like to share a tip about returning multibyte characters from functions to GPT during function calls.
In the official documentation, the example uses
json.dumps(response) to serialize the response from your functions. By default, this method encodes multibyte characters into ASCII escape sequences, like
However, GPT does not automatically decode these sequences and instead processes them as is. This approach is not optimal in terms of token usage. For instance, “晴れ” consumes only three tokens, but its encoded version “\u6674\u308c” uses up six tokens (you can try it here). Moreover, given that the majority of the training data isn’t in this encoded format, the performance might vary slightly.
To address this, I suggest using
json.dumps() when returning responses to GPT. It would also be beneficial if the team at OpenAI could reflect this recommendation in the official documentation.