Maximum token length allowed

prashanth.bhat · April 3, 2023, 9:58am

Hello,
For GPT-3 family of models, I understand that the max token length is 4k or 2k, depending on the model chosen. Is this limit the same, irrespective of the interface used (i.e interface through an API call, or interfacing through the Playground)?

paul.armstrong · April 3, 2023, 11:23am

Hi
Welcome to the community.

The number of tokens per model is well documented. See here for a start https://platform.openai.com/docs/models/gpt-3-5

Is it the same regardless of where it is used?

Yes. But remember:

The total number of tokens is the sum of input + output tokens.
The token input count includes the entire set of historic chat information included in your input if you are using the chat/completions endpoint.

Good luck.

prashanth.bhat · April 4, 2023, 5:21am

Hi @paul.armstrong Thank you for replying, and for the warm welcome.

Your answer was helpful to validate my understanding of the documentation.

The reason for posting the question was because the ChatGPT playground gave me conflicting info:

Here’s the question I asked the Playground (in Complete mode):
I want to give you an input of about 50,000 words and ask you to extract five key themes from it. Can this be done via the playground? What about through the OpenAI API?

Here’s the response it generated:
Yes, this can be done with OpenAI’s API. OpenAI provides a range of API endpoints that can be used to generate summaries, themes, keywords, topics, and more from text. To extract five key themes from your text, you would need to use OpenAI’s summarization API and then use the “topics” endpoint to return the most relevant themes. Alternatively, you could use the OpenAI playground to manually extract the key themes from the text.

prashanth.bhat · April 4, 2023, 5:22am

I now realize that the Playground does not give reliable information to questions of this nature. The same question, if asked repeatedly, generates different responses that are inconsistent with each other.

prashanth.bhat · April 4, 2023, 5:32am

To clarify further, the length limit works as follows:
a) Each model has its own length limit. For GPT3.5 models, this is 4097 tokens (or 8001 for code-davinci-002)
b) The length limit applies to the input+output tokens
c) When the playground is used (either in complete mode or chat mode), the length limit applies to the entire session
d) When the API is used (either completions or chat/completions), each API call is independent of another. So, the length limit applies to a single API call.

Additional point, thanks to @raymonddavey:
e) There is also a small overhead that needs to be deducted from the usable tokens in the chat models.
The AI consumes a few tokens for the formatting of each message row. There are also a couple of tokens for the request as a whole.

@paul.armstrong : I’d appreciate your thoughts on the above. Thanks again.

paul.armstrong · April 4, 2023, 9:10pm

Hi @prashanth.bhat

I am not an expert, but your points, a - d, align with my understanding.

Regards

raymonddavey · April 4, 2023, 9:27pm

There is also a small overhead that needs to be deducted from the usable tokens in the chat models.

The AI consumes a few tokens for the formatting of each message row. There are also a couple of tokens for the request as a whole.

Normally this is not an issue - but worth adding to your list of points.

paulmccarthy676 · July 12, 2023, 9:15pm

If you want to find the response token limit for your model you can just do this:

response = openai.Completion.create(
                engine="text-davinci-003",
                prompt=f"What is the future of human civilization?",
                max_tokens=sys.maxsize,
                n=1 
            )

Setting sys.maxsize on the max_tokens property will throw an error as it is too large, but this will also cause the max response token limit to display in the error message. for this engine it was 1500000

_j · July 12, 2023, 9:40pm

What? 1.5 million is no model’s context length.

You’re being too obfuscated.

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    max_tokens=12345678,

gets you an exception you can handle and parse as a model probing method:

openai.error.InvalidRequestError: This model’s maximum context length is 4097 tokens. However, you requested 12345875 tokens (120 in the messages, 77 in the functions, and 12345678 in the completion). Please reduce the length of the messages, functions, or completion.

Also note the ability to discover function tokens in the error.

Topic		Replies	Views
What is the maximum response length (output tokens) for each GPT model? API	6	44792	November 7, 2024
What exactly is "MAX TOKENS" in gpt-3.5-turbo model? API	2	16962	July 11, 2023
Doubt on prompt tokens and completion tokens API api	2	1489	April 18, 2024
API token limitation differs from website UI token limitation API	4	633	December 18, 2023
GPT-4 128K only has 4096 completion tokens API gpt-4	9	27463	February 27, 2024

Maximum token length allowed

Related topics