How does ChatGPT have such massive token limit?

Can anyone point to a technical explanation of how ChatGPT has such a massive token limit compared to vanilla GPT or even Instruct models? Is it really possible to increase it by such a significant amount through mere finetuning?


Token limit has more to do with compute than fine-tuning. They could be throwing more hardware at ChatGPT or have other improvements that allow it to have a higher token limit.

Does anyone know if it actually has a higher token limit or does it just seem that way.

Good to see you back!

ETA: Similar thread here…although you were first to post! :wink:


It is possible that it has more tokens, but so far the chat conversions seems like it can handle a lot of tokens than one would expect. My theory is that it probably has a clever way to handle prompt engineering with a combination of summarization and semantic search to minimize the tokens.


However they say below it does not store “any” info past 4000 tokens :slight_smile: Does ChatGPT remember what happened earlier in the conversation? | OpenAI Help Center

My other thread shows how past 13,000 tokens (about 11,000 words) with my name logs removed and newlines accounted for, it can repeat back the top most about 25 words of my whole first message. I used the Find Tool on chrome to make sure it only appears there ever. I also filmed a video of the convo.


Here’s what chatGPT has to say:


Agree - clearly clever summarization - no limit over 4k yet, it seems. But very well done - it is incredibly seamless!

I was engineering a prompt, and I found that if the initial prompt was larger than 1k tokens, it started to malfunction. 1008 tokens is fine, but 1100 tokens and it would start to have problems with the commands

Instead, seems like inserting multiple 1k prompts instead of a big one works better. This is just my case, but might be happening to other people

@PaulBellow I am not sure that is true. I think the length of input+output is something called the “context size” and it is defined at training time as part of the embedding process. The context size can be changed during fine-tuning, but doing so often leads to performance degradation. Since GPT-3 has a context window of 2048, and ChatGPT is a finetuned version of GPT3, I was surprised to see such high token lengths possible for ChatGPT. Asking around, I think its probably the case that ChatGPT has a context window of 4096. It will accept more tokens past this point but only reads the first 4096 – which would still be consistent with the behavior @immortal.discoveries is reporting.

1 Like

If my 13,000 token (token = about 1 word) conversation is this below:
Above, the square closed brackets depicts the first 4K. And ( ) depicts the last 4K. Are you saying it reads the first or last 4K?

The point it read back from our conversation was the farthest back, 13,000 tokens, or I recall about I think 11,000 words. It did it - and I don’t know how.

1 Like

I think the algorithm for ChatGPT feeds back the “last” message and responses until it runs out of token allowance

We wont know for sure, but it keeps recent discussions on track and doesn’t loop back to the start of the overall chat

I filmed it and know for sure, among many tests I did this was one of them (I’m surprised my reddit post 0 upvotes, I mean I found out it does something at all !):

1 Like

Yeah, that’s what I suspect as well.

if conversation_size > token_limit:
   sum_conv = sum_context_within_token_limit(conversation)

Must be something along those lines.