Why is gpt-3.5-turbo-1106 max_tokens limited to 4096?

At some point gpt-3.5-turbo-1106 has switched to limiting max_tokens to 4096. I believe it should be higher since I think max_tokens includes both prompt and generation tokens. gpt-3.5-turbo-1106 should have a prompt limit of 16,385 tokens if I am correct.

I used to be able to pass at least 10,000 max_tokens but now I get an error. I do understand that it will only return a maximum of 4,096 output tokens but I believe max_tokens should be higher.

Has anyone else seen this issue?

edit: I am now seeing in the doc:

max_tokens = The maximum number of tokens to generate in the chat completion.

Does it no longer include prompt tokens?

1 Like

Yes, all the 1106 models ie gpt-3.5 & gpt-4 have this limitation. I am wondering if this has to do with capacity issues. I am hoping that the non preview/production version would remove this limitation, but it is a long shot I think. These new models seem like 1 step forward 2 step backwards. But of course, for the majority of use cases, 4096 tokens output maybe ample.

1 Like