Performance impact of max_tokens setting

I have occasionally read that assigning a high value to max_tokens has performance implications.

Can anyone with expert knowledge confirm or correct this anecdote?

Note: I am generating an accurate pre count of request_tokens (using tiktoken).

What I’m trying to ascertain is whether it makes sense to set max_tokens = model_token_limit - request_tokens or to estimate a lower value, if doing so will improve API response times.

There might be more delays if you demand longer responses(that’s what max_tokens for anyway).

In my product, users are able to set their own max_tokens and I calculate the final max_tokens with your formula: max_tokens = model_token_limit - request_tokens

1 Like

what about the count function?
I’m trying do the same using tiktoken but is not working properly.
Here my Typescript code (npm i --save ticktoken)

import * as ticktoken from "tiktoken";


    private tokenCounter(model: ticktoken.TiktokenModel, messages: ChatCompletionRequestMessage[]): number {

        const enc = ticktoken.encoding_for_model(model);

        let nTokens = 0;
        messages.forEach(m => {
            const encMessage = enc.encode(m.content);
            nTokens += encMessage.length;
        return nTokens;

The result, in my case, is wrong :upside_down_face: