Max tokens - how to get GPT to use the maximum available tokens?

seb.gibbs · October 15, 2023, 1:51pm

Using GPT3.5 API.

What is the default for max_tokens? Would the default be the maximum available?

I just want the AI to use the maximum tokens available to it.

To do this it seems I have to calculate how many I’ve used and telling what’s likely left to use for the response. - is this correct? or is there a simpler way?

also,
Is GPT aware of the max_token parameter and attempts to limit is response? or is this a hard limit that just blocks the response when it goes over?

Foxalabs · October 15, 2023, 1:53pm

Hi,

Yes, omitting the max_tokens parameter for GPT3.5 will give you the maximum token size possible for the reply.

It is not “aware” of the limits, it will generate tokens until there is a likley possible stop token generated.

zeki.unyildiz · October 15, 2023, 4:35pm

The max_tokens parameter in the GPT-3.5 API is optional. If set, it limits the response to that number of tokens. If not set, the limit is the model’s max capacity (4096 tokens for GPT-3.5 Turbo) minus the tokens used in the prompt. The max_tokens serves as a hard cap, truncating the response if the limit is reached. To utilize the maximum tokens available, you’d need to calculate the remaining tokens based on the tokens used in your prompt.

brandojazz · December 19, 2023, 4:42am

@Foxalabs is this true for `gpt-4-1106-preview’? I am prompting my model to do some “parsing/text extraction” but it’s obviously stopping before it should.

I tested this in the ChatGPT interface and it worked. So I am puzzled. What is going ?

related: How to use the max number of tokens with the openai api? - Stack Overflow

Foxalabs · December 19, 2023, 5:10am

GPT-$-Turbo (i.e. the preview model) is limited to 4k of output tokens and it tuned to produce compact responses. ChatGPT-4 makes use of the Turbo model so their responses should be similar unless you have different system prompting/temperatures, etc. If you are trying to get the most tokens out per call then you fighting against the model that wants to limit its output, typically this is bypassed by splitting API calls into sections and requests multiple smaller responses and concatenating the results.

brandojazz · December 19, 2023, 6:03am

I want the model to extract the text I’m asking for in one shot. I cannot easily split the text and have my stuff work. If I could I wouldn’t be asking GPT4 to do the parsing/extraction for me (I’d parse it myself).

Would using regular gpt-4 solve my problem? (doesn’t seem like it did)

Btw, I’m not using turbo. I’m using gpt4.

Topic		Replies	Views
Why is gpt-3.5-turbo-1106 max_tokens limited to 4096? API	3	13976	January 11, 2024
How do I get gpt to throw out more tokens in API? API gpt-4	3	2037	December 16, 2023
Max_Tokens - Best practice for long-form answers? API	4	2779	May 24, 2023
What exactly is "MAX TOKENS" in gpt-3.5-turbo model? API	2	16642	July 11, 2023
Chat GPT4 1106 vs ChatGPT 4: Impressive drop in quality API gpt-4 , chatgpt	27	15563	February 14, 2024

Max tokens - how to get GPT to use the maximum available tokens?

Related topics