What exactly is "MAX TOKENS" in gpt-3.5-turbo model?


MAX_REQUEST is 4096 tokens, does it mean that messages will only keep 4096 tokens after encoding and the extra tokens will be discarded?

# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
import openai

        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}

According to the instructions below, the number of tokens in the request will also affect the number of tokens in the reply?

Note too that very long conversations are more likely to receive incomplete replies. For example, a gpt-3.5-turbo conversation that is 4090 tokens long will have its reply cut off after just 6 tokens.


According to the above description, if I understand correctly, there can only be a maximum of 4096 tokens in a context.