Since the token limit is 4096 for GPT4, how does 8k and 32k model make a difference?

goalkeeperyucheng · August 27, 2023, 8:03am

Since the token limit for API call is 4096, which means we have to separate long document anyway. How does 32k model make a difference from 8k model?

anon22939549 · August 27, 2023, 8:47am

Where did you get the idea the token limit is 4096?

goalkeeperyucheng · August 28, 2023, 3:48am

From the document

Depending on the model used, requests can use up to 4097 tokens shared between prompt and completion. If your prompt is 4000 tokens, your completion can be 97 tokens at most.
The limit is currently a technical limitation, but there are often creative ways to solve problems within the limit, e.g. condensing your prompt, breaking the text into smaller pieces, etc.

anon22939549 · August 28, 2023, 4:04am

That document was written during the time of GPT-3 models. See text-davinci-003, etc that have a token limit of 4,097.

That part of the documentation hasn’t apparently been updated yet.

The token limit for gpt-4 is 8192.

Topic		Replies	Views
Why is gpt-3.5-turbo-1106 max_tokens limited to 4096? API	3	14186	January 11, 2024
What’s the difference in the 8k and 32k model of gpt 4? API	3	13366	December 13, 2023
Why is GPT-4.1 output capped at ~6,000 tokens despite a 32,768-token limit? API token	0	51	September 11, 2025
Only allowed to set max_tokens to 4095 API	4	673	May 17, 2024
Is the "output (Maximum length)" for the GPT-4-1106-preview API still capped at 4095? API gpt-4 , gpt-4-turbo	3	7656	November 15, 2023

Since the token limit is 4096 for GPT4, how does 8k and 32k model make a difference?

Related topics