How does Assistant API (ChatGPT System) handles long context (aggregation of prompts & responses) in a Thread?

Hi there! I’m Harsh & I’m training Small Language Models on my own to get an experiential learning.

I was going through the assistant’s API & saw a new approach to preserve the context (aggregating prompts & responses) & enhance the model output by an additional ‘system’ tag (with an additional system information/description).

Here’s a situation to explain my question better.

  • Imagine an LLM that’s powering the ChatGPT has a context window of 100 tokens. Consider the below situation.

    • User prompts an input with 80 tokens & gets a response of 30 tokens.

    • User follows up with a prompt of 50 tokens.

    • I’m assuming when user follows up with 50 tokens, a master prompt is being sent to the model that contains the history (prompt_1, response_1 & prompt_2)

It can be clearly see that the context length has exceeded (because of aggregating the previous prompt & response pairs.

So what’s happening at the backend? Is the master_context got truncated?

Looking forward to it.
Cheers!