I was expecting the “gpt-4-1106-preview” model to have 128K limit for tokens to generate. But even in Playground it’s 4K, lower than any other GPT 4 model. Is that a bug, or did I understand the whole “GPT 4 Turbo” concept wrong?
What use of 128k input tokens if the output is 4k? That’s useless. If I ask for a summary of 300-page text, how do they expect to fit the summary in 4k? I am seriously waiting for Google Gemini; I am quite sure that will be the end of OpenAI.
Thanks for clarifying. Well, indeeed, that makes this model nearly useless even for reports about document analysis.
The AI isn’t going to write a summary of much more than around 500 tokens anyway. It has been trained to make “summaries” that are a specific form - and of curtailed output to reduce model computation for those that aren’t paying by-the-question. Give it a try and count the tokens.
Also count how many tokens are in a 3000 word document…
Where it will disappoint is those uses where the AI has no latitude to shrink, condense, or crush the output (which it will do if you ask for a rewrite). You’d have to very specifically instruct to do a task like “correct any spelling or grammar errors within each sentence so the sentence is clearer. You must process each sentence without omission”, and only then might it approach the limit.
ChatGPT max_tokens out? 1536.
You can stick the assistant reply back on to the end of the existing chat, and send it for another API call without a user question though, which will continue writing.
Great news! Google’s Vertex AI has their Chat_Bison_32K (input AND output) Generally Available. I just compared the new ChatGPT4 with Chat_Bison_32K. Guess what? I gave them a document for a summary and ChatGPT4 returned a 600 token output and whatever I did, it would not exceed it. Also, because ChatGPT4 had its summary super limited, it was almost useless because most of the data was missing. Google’s Chat_Bison_32k returned 1800 tokens OpenAI, your end is near!
128K gives more freedom than before. But I’d argue if you need more than 3000 words (Which is about 7 pages worth of words at 11 point font) you’d probably be better off chunking the text up and sending those chunks to be summarized individually. You could use LangChain’s Summarize API with either map reduce, refine, or map rerank strategies to get a better overall summary. With the new space, instead of having to chunk it into like 40+ chunks you could create a summary in a little over 4 depending on the method you use.
I have my own framework which works perfectly - it manages to split material by logical topics and feed them separately. The point was to have less chunks.
Also, this time they separated input limit and output limit, so we have to refactor existing systems to take into account that there may be 128K input and 4K output. It’s no longer “this is 8K model and this is 32K model”, now it’s “128/4 model, where you shouldn’t take input tokens into consideration”.
I’ll stick to the old model.
To tie this all up, with information from Adam and Logan.
ChatGPT-4 (website) has an input of 32k and an output of 4k
GPT-4-Turbo (API) has an input of 128K and an output of 4k
Older GPT4 models had symmetric input/output token limits.
That makes one wonder: what did ChatGPT Enterprise users “upgrade” to…
The inference was 128k for enterprise, but I have not asked that question directly.
If you need more than 4K for summary, follow this:
- Add system prompt that instructs the model summarize
- Add your doc
- Call the API and get 1st part of summary
(optional: add more instructions) - Call the API again to get the second part with all of the history, including previous reply
Basically, just click the submit button in Playground.
The output is 4K per call, but nothing prevents sending the previous output on next call and asking to continue from it.
This is completely wrong, buddy. ChatGPT-4 (website) has an input of 8k and output of ~1k. I won’t be surprised if they reduced the input as well, but I can barely make ChatGPT(4) say anything even close to 1k. It’s almost useless. I have many friends (including myself) who cancel subscriptions.
Yes, it is quite wrong. ChatGPT has a max_token response limit set on the model of 1536 tokens.
The input context length of ChatGPT is intensely managed by crushing your conversation history to the level where only most users complain about memory problems, and there’s no way OpenAI is putting anything like even 6k back into the model as past conversation. There is vast difference between ChatGPT and running your own chatbot with API that doesn’t intentionally forget.
There is no “symmetric” in ChatGPT ever. A reservation is set that is only for output, and it is not half the input.
Symmetric otherwise doesn’t make sense on any model, the context length of an AI model is a single shared space which can be used dynamically for either input or language generation. — That is, until OpenAI decided that users need to be cut off from getting a useful production of output after paying for a subscription. Then API users have to use a model nerfed by being trained to only be OpenAI’s vision of ChatGPT.
Hit the limit of what ChatGPT can respond with at its unseen maximum token setting, and you’d get a “continue” button, or be able to write continue. That is rarely seen any more: solved by AI models that have vastly curtailed output baked-in to their training. Get it to hit the “continue” limit, and paste that output into the tokenizer, you get the same answer every time.
That’s what employees (and those that roleplay like they are) won’t say.
Just to clarify: Does I have 128k for input + 4k for output = 132k total (or input and output must share the total 128k)?
The context window length is a shared space, like other models. OpenAI just blocks you from specifying more that 4k as your max_tokens
response reservation for receiving output.
OpenAI has seemed more generous recently with what they give you for ChatGPT conversation history length in their own web chatbot.