Why is gpt-3.5-turbo-1106 max_tokens limited to 4096?

At some point gpt-3.5-turbo-1106 has switched to limiting max_tokens to 4096. I believe it should be higher since I think max_tokens includes both prompt and generation tokens. gpt-3.5-turbo-1106 should have a prompt limit of 16,385 tokens if I am correct.

I used to be able to pass at least 10,000 max_tokens but now I get an error. I do understand that it will only return a maximum of 4,096 output tokens but I believe max_tokens should be higher.

Has anyone else seen this issue?

edit: I am now seeing in the doc:

max_tokens = The maximum number of tokens to generate in the chat completion.

Does it no longer include prompt tokens?


Yes, all the 1106 models ie gpt-3.5 & gpt-4 have this limitation. I am wondering if this has to do with capacity issues. I am hoping that the non preview/production version would remove this limitation, but it is a long shot I think. These new models seem like 1 step forward 2 step backwards. But of course, for the majority of use cases, 4096 tokens output maybe ample.

1 Like

Hey, although the original post and the comment make sense, I just wanted to
ask for a quick confirmation.

I am using gpt-3.5-turbo-1106 (I can use gpt-4-1106-preview too) with the chat.completions API. The response_format parameter in my API call is set to return a JSON object. I’ve noticed that the JSON output is limited to 4096 tokens in my use case. Does this mean GPT 3.5 and GPT 4 can only produce a maximum of 4096 tokens with the chat.completions.api?

GPT-4-Turbo and GPT-3.5-Turbo are limited to 4k output tokens, GPT-4-Turbo API is limited to 128k input tokens



1 Like