𝐀𝐧 𝐮𝐧𝐞𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐥𝐢𝐦𝐢𝐭𝐚𝐭𝐢𝐨𝐧: 𝐆𝐏𝐓-4 𝐡𝐚𝐬 128𝐊 𝐜𝐨𝐧𝐭𝐞𝐱𝐭 … 𝐛𝐮𝐭 𝐧𝐨𝐭 𝐪𝐮𝐢𝐭𝐞 𝐬𝐨!
I am running a new experiment with GPT-4 to test the usability of its large context for the task of translation.
While doing it, I kept getting a very short answer from the model when passing it a context of 60K tokens. Initially, I thought the issue was in some function in my code. Eventually, I narrowed down the issue to the output of the model.
The model was returning only 4096 tokens.
I went ahead and set the max_tokens to 60K and that is when I received this error:
“𝐓𝐡𝐢𝐬 𝐦𝐨𝐝𝐞𝐥 𝐬𝐮𝐩𝐩𝐨𝐫𝐭𝐬 𝐚𝐭 𝐦𝐨𝐬𝐭 4096 𝐜𝐨𝐦𝐩𝐥𝐞𝐭𝐢𝐨𝐧 𝐭𝐨𝐤𝐞𝐧𝐬”
I am using the model “gpt-4-1106-preview” and I have confirmed I have a 128K context.
So: the model can receive up to 128K input, but can only output up to 4096 tokens!
𝐓𝐡𝐢𝐬 𝐢𝐬 𝐚𝐧 𝐢𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐭 𝐥𝐢𝐦𝐢𝐭𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐚 𝐬𝐮𝐫𝐩𝐫𝐢𝐬𝐢𝐧𝐠 𝐫𝐞𝐯𝐞𝐥𝐚𝐭𝐢𝐨𝐧!
Has anyone else found this limitation?
Am I doing something wrong?
The GPT-4-Turbo model has a 4K token output limit, you are doing nothing wrong in that regard.
The more suitable model would be GPT-4-32K, but I am unsure if that is now in general release or not.
If you go to the playground https://platform.openai.com/playground?mode=chat
and ensure you art in Chat Mode, select Models and Show more models, that should give you a list of everything you have access to.
Thanks for the confirmation @Foxabilo - I was trying to find this limitation in the documentation of the API but have not been able to.
The models are trained and retrained more recently to give a short unsatisfactory answer. That saves tokens and computations for ChatGPT users, but for those willing to pay for quality (of olde) we are kind of hosed.
The AI model is unaware of your max_token setting or its available context length. The output would be cut off mid-sentence if the token limit was actually reached, but instead the AI is just being done with its thought and not performing your task in a satisfactory manner.
You’ll also get plenty of denials that OpenAI has programmed in to fine-tuning when you try to prompt for more output. Absolutely intentional nerfing.
I’ve noticed that the recently released “gpt-4-1106-preview” model exhibits strange behavior, which seems to be due to some odd fine-tuning.
Despite passing all conversation history as a payload when accessing through the API, the assistant claims it cannot refer to past conversation history for nonsensical reasons.
After I persistently urged, the assistant finally admitted to the context length and occasionally referenced past statements.
It feels as if it’s intentionally generating token counts when accessed via the API.
Also, as noted above, ChatGPT often generates very short sentences, but it is odd that it generates unnecessarily long sentences when accessed via the API.
Furthermore, it is also strange that this does not happen with Playground.
I also feel this is an intentional nerf.
the -1106 preview is more verbose up to its 500 tokens or so, but you’ll also find it is less focused on the task and giving less concrete or substantial answers. It looks more appealing on the surface, but it’s not going to perform your “rewrite 3000 tokens into a different 3000 tokens” (which is actually a lot of text, more than a chat user might expect to read, but certainly a case for use of the model to perform a specific task).
Yes, indeed, language models like this one have use cases beyond just chatting, such as RAG and other language tasks.
What I experienced was specifically related to its use as a chat model.
I acknowledge that I should have paid more attention to the theme here, which was focused on completions.
I will be more careful in the future. Thank you for pointing this out.
Ouch! This is extremely disappointing as I’ve been awaiting a GPT 4-model with larger capacity to make my new app work…