Performance impact of max_tokens setting

iamido · June 20, 2023, 10:57pm

I have occasionally read that assigning a high value to max_tokens has performance implications.

Can anyone with expert knowledge confirm or correct this anecdote?

Note: I am generating an accurate pre count of request_tokens (using tiktoken).

What I’m trying to ascertain is whether it makes sense to set max_tokens = model_token_limit - request_tokens or to estimate a lower value, if doing so will improve API response times.

kjordan · June 21, 2023, 6:27am

There might be more delays if you demand longer responses(that’s what max_tokens for anyway).

In my product, users are able to set their own max_tokens and I calculate the final max_tokens with your formula: max_tokens = model_token_limit - request_tokens

omanzelli · July 26, 2023, 12:59pm

what about the count function?
I’m trying do the same using tiktoken but is not working properly.
Here my Typescript code (npm i --save ticktoken)

import * as ticktoken from "tiktoken";

[...]

    private tokenCounter(model: ticktoken.TiktokenModel, messages: ChatCompletionRequestMessage[]): number {

        const enc = ticktoken.encoding_for_model(model);

        let nTokens = 0;
        messages.forEach(m => {
            const encMessage = enc.encode(m.content);
            nTokens += encMessage.length;
        });
        return nTokens;
    }

The result, in my case, is wrong

Topic		Replies	Views
Struggling to get correct token count Community gpt-4 , gpt-35-turbo , api	2	1899	September 4, 2023
Gpt4 token usage not using more than 3000 tokens even though it’s listed at much higher availability API	12	1923	December 17, 2023
How does gpt-4o-mini-tts calculate tokens? API tts , tokenization	1	77	May 13, 2025
I need help using openai API API chatgpt , gpt-4o-mini	2	229	October 29, 2024
Counting tokens for chat API calls (gpt-3.5-turbo) Documentation	5	27456	December 13, 2023

Performance impact of max_tokens setting

Related topics