What exactly is "MAX TOKENS" in gpt-3.5-turbo model?

shengzhouzzzzzz1216 · March 13, 2023, 1:30pm

https://platform.openai.com/docs/models/gpt-3-5

MAX_REQUEST is 4096 tokens, does it mean that messages will only keep 4096 tokens after encoding and the extra tokens will be discarded?

# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
import openai

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

According to the instructions below, the number of tokens in the request will also affect the number of tokens in the reply?

Note too that very long conversations are more likely to receive incomplete replies. For example, a gpt-3.5-turbo conversation that is 4090 tokens long will have its reply cut off after just 6 tokens.

https://platform.openai.com/docs/guides/chat/managing-tokens

According to the above description, if I understand correctly, there can only be a maximum of 4096 tokens in a context.

alijakhar · July 11, 2023, 12:43pm

No, messages are not discarded automatically you need to manage the messages list by yourself, here is the issue solution to help how you can do that.

_j · July 11, 2023, 12:55pm

The person asking the question has not been seen on the forum since March 15.

The screenshot and also some other OpenAI documentation is a bit mixed up in nomenclature.

An AI model engine has a context length. This is the size of the AI’s available memory area that is preloaded with your question, and also where the answer must be generated following that input.

It is measured in tokens, which is a compression system that can render many words as a single token - averaging about 1.5 tokens per word for Western languages.

The max_tokens parameter that is used by the API specifically reserves an area in the context length that can be used for only generated answers. It also sets the maximum size of the answer you can receive back. It is the maximum tokens of just the response. The amount of input you can then send to the AI model is then only the amount remaining after that is subtracted.

Topic		Replies	Views
Maximum token allowed for chat gpt model gpt 3.5 turbo API chatgpt	3	2682	February 15, 2024
Why is gpt-3.5-turbo-1106 max_tokens limited to 4096? API	3	13939	January 11, 2024
Maximum token length allowed API	9	36567	December 13, 2023
Doubt on prompt tokens and completion tokens API api	2	1234	April 18, 2024
I need help using openai API API chatgpt , gpt-4o-mini	2	226	October 29, 2024

What exactly is "MAX TOKENS" in gpt-3.5-turbo model?

Related topics