How would I know in advance before sending this request that # if tokens is going to fit the model limit?
I found this library npm/gpt-tokenizer but it doesn’t seem to cover size of functions. Can I just use encode with stringified array of functions like that encode(JSON.stringify(functions)).length? Or if there better solution for that?
I think the simplest way would be to count the tokens in the prompt and check those against the limit of the model you are running against. Conversely, this will also allow you to figure out how many token are allowed in the generation of the output from GPT.
Yes it does. tiktoken lets you take a text, and count the number of tokens in it.
You should do this with the sum of all your prompts (and their roles.)
This will get you the total size of tokens input.
The total number of tokens the model can generate in addition, is then (model_size - input_tokens)
This way, you will know whether it will fit or not.
Getting the exact token count is going to be difficult because we have no clue how they’re embedding functions into the prompt at this point. I would hope they’re not just shoving the JSON schema in there as that would be a massive waste of tokens.
I’m using gpt-tokenizer in my projects as well and your current approach is probably a reasonable approximation of the added token overhead.
I haven’t tried this yet but you might try counting input tokens and compare that to what you get back in the usage section of the response.