Function execution result: how to produce less tokens?

As far as I understand the documentation, function role messages require content to be a JSON object or any AI-readable information, including raw text. When I use JSON I get too many tokens.

I would expect to spend 2 times less tokens on that. Any idea how to save number of tokens in response for function execution? The only way I can see is just to optimise my JSON object and remove useless info. But still, I would like to spend less tokens on that. Maybe there is some format that alternative to JSON but doesn’t require so many tokens to be spent?

1 Like

Are you using the function-calling behavior? If so you’ll need to stick with JSON in the response, that’s how GPT is programmed to respond.

Otherwise, yeah, JSON is not the most optimized response format if you are trying to minimize token usage. Outside of function-calling you can try sending or asking for YAML, or newline-delimited text

success: true
hello: word
1 Like

Yep, I also thought of Yaml. It reduces # of tokens more than 2 times but I wanted to make sure that OpenAI understands it as good as JSON.

You can “produce” less tokens by using the right tokenizer for the model:

100k-tokenizer-function

Spend two times less than that = saving you $0.016 per thousand function returns.

The AI isn’t rigid code that will crash if JSON is mistyped or unenclosed (just don’t have it submit that to another function). You can see if it will thusly perform even better when you ditch the quotes and use more common tokens seen in language, and braces that agree in leading spaces:

100k-tokenizer-function-noquote

(I added two more tokens after to show you character compressibility)

AI understands most anything. Fine-tuning on a function role doesn’t destroy its language ability for other things you could return there. I suspect the tuning will just tell it that it acts on that role also. We just hope they don’t put in a JSON checker on the endpoint like they do a list/dictionary checker like they do on the whole to prevent raw input.

(note, your actual code might have to assemble this string instead of just using native data types)

I would say the tuning that exists for function returns, which is probably not as extensive as how to call functions, likely results in a behavior one could describe as “integrate this knowledge”, “report on this information”, “handle this error”.

Hi @finom
Just want to point out that the tokenizer you’re using in the screenshot shared is for GPT-3 models.

Newer models use c100k encoding for which you can use tiktoken to count tokens.

1 Like

Can you give me a link to this tokeniser? I can’t find it.