Question about function completion model tokenization

I am using the gpt-3.5-turbo-0613 and gpt-4-0613 models and am feeding it stuff like this

{
        "name": "sendMessage",
        "description": "Send a message to a number",
        "parameters": {
            "type": "object",
            "properties": {
                "contact": {
                    "type": "string",
                    "description": "The number of the contact, 1234567890. Must be left empty if user does not specify.",
                },
                "message": {
                    "type": "string",
                    "description": "The message to send, e.g. Hello World. Leave empty if user does not specify.",
                },
            },
            "required": ["contact", "message"],
        },
    },

I have noticed, however, that many tokens are taken up by all the whitespaces. Will removing all the unnecessary whitespaces hurt performance?

Welcome @mattia.u.nee

Are you referring to the whitespaces/tabs in the indentation for the JSON?

In that case, they aren’t consuming tokens as they are removed when stringifying the JSON to convert it into a system message.

1 Like

Perfect, thank you very much for the info!

quote=“sps, post:2, topic:297000, full:true”]
Welcome @mattia.u.nee

Are you referring to the whitespaces/tabs in the indentation for the JSON?

In that case, they aren’t consuming tokens as they are removed when stringifying the JSON to convert it into a system message.
[/quote]

There’s two tokens to be saved per element if you strip carriage returns and any level of indentation.

However, you are showing us what is sent to the API module, not what the AI actually receives as language, which is significantly different. There’s no more indentation in the non-JSON function presentation pickling the AI uses.

They have tried to obfuscate the actual format (which doesn’t need more than just adding it to a system prompt the right way) by only giving python examples.