I am running a new experiment with GPT-4 to test the usability of its large context for the task of translation.
While doing it, I kept getting a very short answer from the model when passing it a context of 60K tokens. Initially, I thought the issue was in some function in my code. Eventually, I narrowed down the issue to the output of the model.
The model was returning only 4096 tokens.
I went ahead and set the max_tokens to 60K and that is when I received this error:
The GPT-4-Turbo model has a 4K token output limit, you are doing nothing wrong in that regard.
The more suitable model would be GPT-4-32K, but I am unsure if that is now in general release or not.
If you go to the playground https://platform.openai.com/playground?mode=chat
and ensure you art in Chat Mode, select Models and Show more models, that should give you a list of everything you have access to.
The models are trained and retrained more recently to give a short unsatisfactory answer. That saves tokens and computations for ChatGPT users, but for those willing to pay for quality (of olde) we are kind of hosed.
The AI model is unaware of your max_token setting or its available context length. The output would be cut off mid-sentence if the token limit was actually reached, but instead the AI is just being done with its thought and not performing your task in a satisfactory manner.
Youโll also get plenty of denials that OpenAI has programmed in to fine-tuning when you try to prompt for more output. Absolutely intentional nerfing.
Iโve noticed that the recently released โgpt-4-1106-previewโ model exhibits strange behavior, which seems to be due to some odd fine-tuning.
Despite passing all conversation history as a payload when accessing through the API, the assistant claims it cannot refer to past conversation history for nonsensical reasons.
After I persistently urged, the assistant finally admitted to the context length and occasionally referenced past statements.
It feels as if itโs intentionally generating token counts when accessed via the API.
Also, as noted above, ChatGPT often generates very short sentences, but it is odd that it generates unnecessarily long sentences when accessed via the API.
Furthermore, it is also strange that this does not happen with Playground.
the -1106 preview is more verbose up to its 500 tokens or so, but youโll also find it is less focused on the task and giving less concrete or substantial answers. It looks more appealing on the surface, but itโs not going to perform your โrewrite 3000 tokens into a different 3000 tokensโ (which is actually a lot of text, more than a chat user might expect to read, but certainly a case for use of the model to perform a specific task).
Yes, indeed, language models like this one have use cases beyond just chatting, such as RAG and other language tasks.
What I experienced was specifically related to its use as a chat model.
I acknowledge that I should have paid more attention to the theme here, which was focused on completions.
I will be more careful in the future. Thank you for pointing this out.