Max tokens - how to get GPT to use the maximum available tokens?

Using GPT3.5 API.

What is the default for max_tokens? Would the default be the maximum available?

I just want the AI to use the maximum tokens available to it.

To do this it seems I have to calculate how many I’ve used and telling what’s likely left to use for the response. - is this correct? or is there a simpler way?

also,
Is GPT aware of the max_token parameter and attempts to limit is response? or is this a hard limit that just blocks the response when it goes over?

1 Like

Hi,

Yes, omitting the max_tokens parameter for GPT3.5 will give you the maximum token size possible for the reply.

It is not “aware” of the limits, it will generate tokens until there is a likley possible stop token generated.

The max_tokens parameter in the GPT-3.5 API is optional. If set, it limits the response to that number of tokens. If not set, the limit is the model’s max capacity (4096 tokens for GPT-3.5 Turbo) minus the tokens used in the prompt. The max_tokens serves as a hard cap, truncating the response if the limit is reached. To utilize the maximum tokens available, you’d need to calculate the remaining tokens based on the tokens used in your prompt.

@Foxalabs is this true for `gpt-4-1106-preview’? I am prompting my model to do some “parsing/text extraction” but it’s obviously stopping before it should.

I tested this in the ChatGPT interface and it worked. So I am puzzled. What is going ?

related: How to use the max number of tokens with the openai api? - Stack Overflow

GPT-$-Turbo (i.e. the preview model) is limited to 4k of output tokens and it tuned to produce compact responses. ChatGPT-4 makes use of the Turbo model so their responses should be similar unless you have different system prompting/temperatures, etc. If you are trying to get the most tokens out per call then you fighting against the model that wants to limit its output, typically this is bypassed by splitting API calls into sections and requests multiple smaller responses and concatenating the results.

I want the model to extract the text I’m asking for in one shot. I cannot easily split the text and have my stuff work. If I could I wouldn’t be asking GPT4 to do the parsing/extraction for me (I’d parse it myself).

Would using regular gpt-4 solve my problem? (doesn’t seem like it did)

Btw, I’m not using turbo. I’m using gpt4.