At some point gpt-3.5-turbo-1106 has switched to limiting max_tokens to 4096. I believe it should be higher since I think max_tokens includes both prompt and generation tokens. gpt-3.5-turbo-1106 should have a prompt limit of 16,385 tokens if I am correct.
I used to be able to pass at least 10,000 max_tokens but now I get an error. I do understand that it will only return a maximum of 4,096 output tokens but I believe max_tokens should be higher.
Has anyone else seen this issue?
edit: I am now seeing in the doc:
max_tokens = The maximum number of tokens to generate in the chat completion.
Does it no longer include prompt tokens?