How the max tokens are considered

deb23 · August 1, 2023, 8:16am

Hi, probably mine is a stupid question but I have this doubt: each model has a maximum number of tokens (for example the gpt-3.5-turbo model has a maximum number of tokens 4,096 tokens) but are they considered for each prompt (prompt+answer) or in the total code?
Let me explain: 4,096 tokens are the max tokens for a single prompt or if, for example, I create a code where multiple prompts are sent, 4,096 tokens are for the entire code (so for more prompts).
I don’t know if I managed to explain myself.

Foxalabs · August 1, 2023, 8:20am

Not a stupid question at all, it confuses many. 4096 tokens is your entire world, you must do everything within that limit, you must ask your question and pass any history of past questions and answers and leave space for the latest answer within that same limit. Hope that helps.

udm17 · August 1, 2023, 8:29am

A morbid way to put it but yes, on point with this statement

deb23 · August 1, 2023, 8:30am

Ok, thanks, that’s a little clearer.

So, in summary, the max tokens include prompt and response (and any “history” in the case of chat completion).

But if I send two prompts (therefore two requests) do I have 4096 tokens for the first prompt and 4096 tokens for the second or do I have 4096 tokens for both? This is my doubt.

udm17 · August 1, 2023, 8:32am

For both. The 4096 token limit is for 1 API call/request.

There is a total limit called Ratelimit which is at an organisation level which is for the total number of token you can send per minute

Foxalabs · August 1, 2023, 8:35am

For prompt and reply A to influence prompt and reply B in some way, at least some tokens from prompt and reply A must make their way into prompt B. So it is always 4096 tokens no matter how many questions it is over, so long as you wish to retain context and relevance, if you don’t care then you have a fresh 4096 to play with each time.

deb23 · August 1, 2023, 8:36am

ok, I know about the TPM rate limits for the API but usually the TPM is not higher than the max tokens allowed by the model? so I thought they were separate things.
I’m sorry but this token thing really confuses me

deb23 · August 1, 2023, 8:38am

perfect, thank you very much! it’s what I needed.

rafaqat.azam · September 26, 2023, 11:10am

@deb23 Not at all a silly question.
It has confused many people.

MAX_TOKENS is the length of generated response
But in total prompt + generated response length shouldn’t be more than the model you are using.

It’s upto you now, what MAX_TOKENS you set and what left is for your prompt (since you may be sending complete chat history)

For example:
If using gpt-3.5-turbo has limit of 4096 tokens and I set MAX_TOKENS= 1000 (now generated response would be within 1000 tokens). Now 3096 tokens are left for your prompt and you can also count the prompt token before sending to gpt.

This is how it goes.

Topic		Replies	Views
What exactly is "MAX TOKENS" in gpt-3.5-turbo model? API	2	16361	July 11, 2023
Doubt on prompt tokens and completion tokens API api	2	1070	April 18, 2024
Clarification for max_tokens API codex	10	94236	December 12, 2023
MAX TOKENS is 4,096 tokens for gpt-3.5-turbo should fit the the messages sent and the answer generated? API api	10	6134	December 18, 2023
Question regarding max_tokens Prompting	11	37078	December 13, 2023

How the max tokens are considered

Related topics