How to know # of tokens beforehand when I make function calling + chat history request witn NodeJS

I’ve got the following code:

const completion = await openai.createChatCompletion(
        stream: true,
      { responseType: 'stream' }

How would I know in advance before sending this request that # if tokens is going to fit the model limit?

I found this library npm/gpt-tokenizer but it doesn’t seem to cover size of functions. Can I just use encode with stringified array of functions like that encode(JSON.stringify(functions)).length? Or if there better solution for that?

I think the simplest way would be to count the tokens in the prompt and check those against the limit of the model you are running against. Conversely, this will also allow you to figure out how many token are allowed in the generation of the output from GPT.

I’m not sure whether this is JS or Java but the Open AI docs recommend gpt-3-encoder - npm while it is GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models. for python

For reference: OpenAI Platform

1 Like

Sorry, but it doesn’t answer my question.

Yes it does.
tiktoken lets you take a text, and count the number of tokens in it.
You should do this with the sum of all your prompts (and their roles.)
This will get you the total size of tokens input.
The total number of tokens the model can generate in addition, is then (model_size - input_tokens)
This way, you will know whether it will fit or not.

Getting the exact token count is going to be difficult because we have no clue how they’re embedding functions into the prompt at this point. I would hope they’re not just shoving the JSON schema in there as that would be a massive waste of tokens.

I’m using gpt-tokenizer in my projects as well and your current approach is probably a reasonable approximation of the added token overhead.

I haven’t tried this yet but you might try counting input tokens and compare that to what you get back in the usage section of the response.

1 Like

So actually someone posted a better approach here:

1 Like