What exactly is "MAX TOKENS" in gpt-3.5-turbo model?

https://platform.openai.com/docs/models/gpt-3-5

MAX_REQUEST is 4096 tokens, does it mean that messages will only keep 4096 tokens after encoding and the extra tokens will be discarded?

# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
import openai

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

According to the instructions below, the number of tokens in the request will also affect the number of tokens in the reply?

Note too that very long conversations are more likely to receive incomplete replies. For example, a gpt-3.5-turbo conversation that is 4090 tokens long will have its reply cut off after just 6 tokens.

https://platform.openai.com/docs/guides/chat/managing-tokens

According to the above description, if I understand correctly, there can only be a maximum of 4096 tokens in a context.

No, messages are not discarded automatically you need to manage the messages list by yourself, here is the issue solution to help how you can do that.

The person asking the question has not been seen on the forum since March 15.

The screenshot and also some other OpenAI documentation is a bit mixed up in nomenclature.

An AI model engine has a context length. This is the size of the AI’s available memory area that is preloaded with your question, and also where the answer must be generated following that input.

It is measured in tokens, which is a compression system that can render many words as a single token - averaging about 1.5 tokens per word for Western languages.

The max_tokens parameter that is used by the API specifically reserves an area in the context length that can be used for only generated answers. It also sets the maximum size of the answer you can receive back. It is the maximum tokens of just the response. The amount of input you can then send to the AI model is then only the amount remaining after that is subtracted.

1 Like