Question about function completion model tokenization

mattia.u.nee · July 12, 2023, 7:22pm

I am using the gpt-3.5-turbo-0613 and gpt-4-0613 models and am feeding it stuff like this

{
        "name": "sendMessage",
        "description": "Send a message to a number",
        "parameters": {
            "type": "object",
            "properties": {
                "contact": {
                    "type": "string",
                    "description": "The number of the contact, 1234567890. Must be left empty if user does not specify.",
                },
                "message": {
                    "type": "string",
                    "description": "The message to send, e.g. Hello World. Leave empty if user does not specify.",
                },
            },
            "required": ["contact", "message"],
        },
    },

I have noticed, however, that many tokens are taken up by all the whitespaces. Will removing all the unnecessary whitespaces hurt performance?

sps · July 12, 2023, 7:50pm

Welcome @mattia.u.nee

Are you referring to the whitespaces/tabs in the indentation for the JSON?

In that case, they aren’t consuming tokens as they are removed when stringifying the JSON to convert it into a system message.

mattia.u.nee · July 12, 2023, 7:52pm

Perfect, thank you very much for the info!

quote=“sps, post:2, topic:297000, full:true”]
Welcome @mattia.u.nee

Are you referring to the whitespaces/tabs in the indentation for the JSON?

In that case, they aren’t consuming tokens as they are removed when stringifying the JSON to convert it into a system message.
[/quote]

_j · July 12, 2023, 7:55pm

mattia.u.nee:

{
        "name": "sendMessage",
        "description": "Send a message to a number",
        "parameters": {
            "type": "object",
            "properties": {
                "contact": {
                    "type": "string",
                    "description": "The number of the contact, 1234567890. Must be left empty if user does not specify.",
                },
                "message": {
                    "type": "string",
                    "description": "The message to send, e.g. Hello World. Leave empty if user does not specify.",
                },
            },
            "required": ["contact", "message"],
        },
    },

There’s two tokens to be saved per element if you strip carriage returns and any level of indentation.

However, you are showing us what is sent to the API module, not what the AI actually receives as language, which is significantly different. There’s no more indentation in the non-JSON function presentation pickling the AI uses.

They have tried to obfuscate the actual format (which doesn’t need more than just adding it to a system prompt the right way) by only giving python examples.

Topic		Replies	Views
Does function calling output charge for white space? API api , function-calling	6	1564	July 27, 2023
Is there a way to force function to return a minified json? API api , functions	6	1592	July 1, 2023
Token Optimization Question API	2	1357	May 11, 2023
Function execution result: how to produce less tokens? API api	5	1262	July 13, 2023
Are spaces and new lines counts as tokens API api	2	8041	June 18, 2023

Question about function completion model tokenization

Related topics