At some point gpt-3.5-turbo-1106 has switched to limiting max_tokens to 4096. I believe it should be higher since I think max_tokens includes both prompt and generation tokens. gpt-3.5-turbo-1106 should have a prompt limit of 16,385 tokens if I am correct.
I used to be able to pass at least 10,000 max_tokens but now I get an error. I do understand that it will only return a maximum of 4,096 output tokens but I believe max_tokens should be higher.
Yes, all the 1106 models ie gpt-3.5 & gpt-4 have this limitation. I am wondering if this has to do with capacity issues. I am hoping that the non preview/production version would remove this limitation, but it is a long shot I think. These new models seem like 1 step forward 2 step backwards. But of course, for the majority of use cases, 4096 tokens output maybe ample.
Hey, although the original post and the comment make sense, I just wanted to
ask for a quick confirmation.
I am using gpt-3.5-turbo-1106 (I can use gpt-4-1106-preview too) with the chat.completions API. The response_format parameter in my API call is set to return a JSON object. I’ve noticed that the JSON output is limited to 4096 tokens in my use case. Does this mean GPT 3.5 and GPT 4 can only produce a maximum of 4096 tokens with the chat.completions.api?