Is the "output (Maximum length)" for the GPT-4-1106-preview API still capped at 4095?

Thank you for announcing the GPT-4-1106-preview API. I’m very moved and grateful. There’s talk that it can process a context of 128k, but is the output still limited to 4095 tokens?

From what I have researched so far, there are two pieces of information:

Unless an OpenAI account reaches Tier 4, the 300,000 TPM (tokens per minute) limit cannot be utilized, which means that without reaching Tier 4, one cannot test the capability of producing outputs around 120k.
Regardless of the tier reached, the output is capped at 4095 tokens.
Which of these pieces of information is correct? I would like to know.

1 Like

The output was never capped. Someone is just looking at a slider control in the API playground that doesn’t go higher than the old model.

The playground slider might not go higher right now. The playground is also not where API developers that write their own software interact with AI models.

API users have a rate limit, the amount that they can send and request from AI models. If you make a request where the size of a single API request is bigger than your rate limit for an entire minute of use, then you will be blocked until you re-do your inputs or max_token request, or receive an upgrade through your purchase of more credit from OpenAI.

1 Like

Sorry, but you are wrong. There is a real 4k output limit. It even says it in the docs here: OpenAI Platform

It is a shame, though. Having a massive 128k context window but only 4k generations. Really limits the possibilities.

1 Like

Yes, I was wrong a week ago: that is new information that was clarified by OpenAI: the new model will never make more than 4k.

Another limitation for you to find (beside rate limits) - API endpoint returning an error when you try to send more than 32768 characters to any model.