Function execution result: how to produce less tokens?

finom · July 12, 2023, 6:32pm

As far as I understand the documentation, function role messages require content to be a JSON object or any AI-readable information, including raw text. When I use JSON I get too many tokens.

I would expect to spend 2 times less tokens on that. Any idea how to save number of tokens in response for function execution? The only way I can see is just to optimise my JSON object and remove useless info. But still, I would like to spend less tokens on that. Maybe there is some format that alternative to JSON but doesn’t require so many tokens to be spent?

novaphil · July 12, 2023, 7:00pm

Are you using the function-calling behavior? If so you’ll need to stick with JSON in the response, that’s how GPT is programmed to respond.

Otherwise, yeah, JSON is not the most optimized response format if you are trying to minimize token usage. Outside of function-calling you can try sending or asking for YAML, or newline-delimited text

success: true
hello: word

finom · July 12, 2023, 7:24pm

Yep, I also thought of Yaml. It reduces # of tokens more than 2 times but I wanted to make sure that OpenAI understands it as good as JSON.

_j · July 12, 2023, 7:41pm

You can “produce” less tokens by using the right tokenizer for the model:

100k-tokenizer-function

Spend two times less than that = saving you $0.016 per thousand function returns.

The AI isn’t rigid code that will crash if JSON is mistyped or unenclosed (just don’t have it submit that to another function). You can see if it will thusly perform even better when you ditch the quotes and use more common tokens seen in language, and braces that agree in leading spaces:

100k-tokenizer-function-noquote

(I added two more tokens after to show you character compressibility)

AI understands most anything. Fine-tuning on a function role doesn’t destroy its language ability for other things you could return there. I suspect the tuning will just tell it that it acts on that role also. We just hope they don’t put in a JSON checker on the endpoint like they do a list/dictionary checker like they do on the whole to prevent raw input.

(note, your actual code might have to assemble this string instead of just using native data types)

I would say the tuning that exists for function returns, which is probably not as extensive as how to call functions, likely results in a behavior one could describe as “integrate this knowledge”, “report on this information”, “handle this error”.

sps · July 12, 2023, 8:11pm

Hi @finom
Just want to point out that the tokenizer you’re using in the screenshot shared is for GPT-3 models.

Newer models use c100k encoding for which you can use tiktoken to count tokens.

finom · July 13, 2023, 1:47pm

Can you give me a link to this tokeniser? I can’t find it.

Topic		Replies	Views
How to know # of tokens beforehand when I make function calling + chat history request witn NodeJS API api	5	2736	July 6, 2023
Using the API? Here's how you can save up to 30% and increase reliability Prompting gpt-4 , chatgpt , api	12	5874	December 17, 2023
Counting tokens for chat API calls (gpt-3.5-turbo) Documentation	5	26474	December 13, 2023
Minimizing cost of function call API functions	5	2880	July 3, 2023
Is there a way to force function to return a minified json? API api , functions	6	1508	July 1, 2023

Function execution result: how to produce less tokens?

Related topics