GPT-4 128K only has 4096 completion tokens

๐€๐ง ๐ฎ๐ง๐ž๐ฑ๐ฉ๐ž๐œ๐ญ๐ž๐ ๐ฅ๐ข๐ฆ๐ข๐ญ๐š๐ญ๐ข๐จ๐ง: ๐†๐๐“-4 ๐ก๐š๐ฌ 128๐Š ๐œ๐จ๐ง๐ญ๐ž๐ฑ๐ญ โ€ฆ ๐›๐ฎ๐ญ ๐ง๐จ๐ญ ๐ช๐ฎ๐ข๐ญ๐ž ๐ฌ๐จ!

I am running a new experiment with GPT-4 to test the usability of its large context for the task of translation.

While doing it, I kept getting a very short answer from the model when passing it a context of 60K tokens. Initially, I thought the issue was in some function in my code. Eventually, I narrowed down the issue to the output of the model.

The model was returning only 4096 tokens.

I went ahead and set the max_tokens to 60K and that is when I received this error:

โ€œ๐“๐ก๐ข๐ฌ ๐ฆ๐จ๐๐ž๐ฅ ๐ฌ๐ฎ๐ฉ๐ฉ๐จ๐ซ๐ญ๐ฌ ๐š๐ญ ๐ฆ๐จ๐ฌ๐ญ 4096 ๐œ๐จ๐ฆ๐ฉ๐ฅ๐ž๐ญ๐ข๐จ๐ง ๐ญ๐จ๐ค๐ž๐ง๐ฌโ€

I am using the model โ€œgpt-4-1106-previewโ€ and I have confirmed I have a 128K context.

So: the model can receive up to 128K input, but can only output up to 4096 tokens!

๐“๐ก๐ข๐ฌ ๐ข๐ฌ ๐š๐ง ๐ข๐ฆ๐ฉ๐จ๐ซ๐ญ๐š๐ง๐ญ ๐ฅ๐ข๐ฆ๐ข๐ญ๐š๐ญ๐ข๐จ๐ง ๐š๐ง๐ ๐š ๐ฌ๐ฎ๐ซ๐ฉ๐ซ๐ข๐ฌ๐ข๐ง๐  ๐ซ๐ž๐ฏ๐ž๐ฅ๐š๐ญ๐ข๐จ๐ง!

Has anyone else found this limitation?

Am I doing something wrong?


The GPT-4-Turbo model has a 4K token output limit, you are doing nothing wrong in that regard.

The more suitable model would be GPT-4-32K, but I am unsure if that is now in general release or not.

If you go to the playground
and ensure you art in Chat Mode, select Models and Show more models, that should give you a list of everything you have access to.


Thanks for the confirmation @Foxalabs - I was trying to find this limitation in the documentation of the API but have not been able to.

Thanks again!



The models are trained and retrained more recently to give a short unsatisfactory answer. That saves tokens and computations for ChatGPT users, but for those willing to pay for quality (of olde) we are kind of hosed.

The AI model is unaware of your max_token setting or its available context length. The output would be cut off mid-sentence if the token limit was actually reached, but instead the AI is just being done with its thought and not performing your task in a satisfactory manner.

Youโ€™ll also get plenty of denials that OpenAI has programmed in to fine-tuning when you try to prompt for more output. Absolutely intentional nerfing.


Iโ€™ve noticed that the recently released โ€œgpt-4-1106-previewโ€ model exhibits strange behavior, which seems to be due to some odd fine-tuning.

Despite passing all conversation history as a payload when accessing through the API, the assistant claims it cannot refer to past conversation history for nonsensical reasons.

After I persistently urged, the assistant finally admitted to the context length and occasionally referenced past statements.

It feels as if itโ€™s intentionally generating token counts when accessed via the API.

Also, as noted above, ChatGPT often generates very short sentences, but it is odd that it generates unnecessarily long sentences when accessed via the API.

Furthermore, it is also strange that this does not happen with Playground.

I also feel this is an intentional nerf.


the -1106 preview is more verbose up to its 500 tokens or so, but youโ€™ll also find it is less focused on the task and giving less concrete or substantial answers. It looks more appealing on the surface, but itโ€™s not going to perform your โ€œrewrite 3000 tokens into a different 3000 tokensโ€ (which is actually a lot of text, more than a chat user might expect to read, but certainly a case for use of the model to perform a specific task).


Yes, indeed, language models like this one have use cases beyond just chatting, such as RAG and other language tasks.

What I experienced was specifically related to its use as a chat model.

I acknowledge that I should have paid more attention to the theme here, which was focused on completions.
I will be more careful in the future. Thank you for pointing this out.

1 Like

Ouch! This is extremely disappointing as Iโ€™ve been awaiting a GPT 4-model with larger capacity to make my new app workโ€ฆ


As this topic has a selected solution, closing topic.