I get the below error when overloading the OpenAI API endpoint for using gpt-3.5-turbo, where it says the maximum prompt (“context”) length is 4097 tokens.
However, the documentation says that the context limit is 4096.
I’ve encountered the same discrepancy for gpt-3.5-turbo-0301
and gpt-3.5-turbo-0613
as well.
My question is, which one – 4096 or 4097 – should I use as a constant in my code?
I would prefer to use the API error message number but I know that when the documentation changes, I would overwrite my constants with whatever values I see there – hence the question.
The API error message seems to be the more accurate one because I can send exactly 4097 context tokens and then receive a response (“completion”) that’s 1 token long, resulting in a total token count of 4098:
(Can’t include second screenshot since new users are limited to one per post)
I’m thinking the website documentation has a typo on it, since the max context token limit for other models matches the error message I get when overloading it. In which case, is this the avenue for reporting bugs?
But if it’s not a typo, why would the numbers be different?