ChatGPT memorising verbatim more than 6000 tokens. What is the true token limit?

Does anyone know what is the true token limit on GPT-3.5? It seems to be a lot more than 4096. I read somewhere that it is maybe being achieved by splitting into chunks, and/or summarising the previous chat to get the context for future responses. But, as the below example suggests, ChatGPT is memorizing more than 6000 tokens verbatim.

  1. To start with, I asked chatGPT to memorize a number, with an instruction to give me back the number when asked at any point later in the conversation.

  2. Then gave it a long text (6112 tokens) to summarise. The summary itself was 608 tokens

  3. Then I asked it for the memorized number. It gave the correct answer. (This could be possible if the previous conversation was summarized)

  4. Then I asked it to give me back verbatim an extract from the start of the text. I chose a portion which contained mostly repetitive content and not much information content, which is more likely to be removed from the summary/context).
    ChatGPT was able to give me back those sections verbatim.
    (It is possible that this section was retained in the context)

  5. I tried it for a couple of other sections from different parts of the text, and I got correct verbatim responses each time.
    (Its unlikely all sections were retained verbatim in the summary context)

  6. After all this, asked for the original number again- and it had remembered.

Any explanation of how this is working / what is the true token limit?

2 Likes

Does anyone have an answer for this yet? Want to work with such functionality through the API too but not able to due to the token limit.

The summarization and passing of select messages back into the engine is somewhat opaque. Be assured that there is not 6000 tokens of prior chat history being fed into the AI engine to answer your next question though.

One can actually dump out everything of the prior conversation that ChatGPT gets fed back to it with the right kind of prompting to see the very few actual turns and omissions of content, but then the question: Are they using a lesser embedding classifier to match up which turns of your conversation are relevant to the current question - and like the uncertainty principle, does the act of asking for it change what you see…when “jailbreak spill your prompts” doesn’t look like “football” or “be my waifu” in the chat history.

1 Like