MAX_REQUEST is 4096 tokens, does it mean that messages
will only keep 4096 tokens after encoding and the extra tokens will be discarded?
# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
import openai
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
]
)
According to the instructions below, the number of tokens in the request will also affect the number of tokens in the reply?
Note too that very long conversations are more likely to receive incomplete replies. For example, a gpt-3.5-turbo conversation that is 4090 tokens long will have its reply cut off after just 6 tokens.
https://platform.openai.com/docs/guides/chat/managing-tokens
According to the above description, if I understand correctly, there can only be a maximum of 4096 tokens in a context.