Hi there! I’m Harsh & I’m training Small Language Models on my own to get an experiential learning.
I was going through the assistant’s API & saw a new approach to preserve the context (aggregating prompts & responses) & enhance the model output by an additional ‘system’ tag (with an additional system information/description).
Here’s a situation to explain my question better.
-
Imagine an LLM that’s powering the ChatGPT has a context window of 100 tokens. Consider the below situation.
-
User prompts an input with 80 tokens & gets a response of 30 tokens.
-
User follows up with a prompt of 50 tokens.
-
I’m assuming when user follows up with 50 tokens, a master prompt is being sent to the model that contains the history (prompt_1, response_1 & prompt_2)
-
It can be clearly see that the context length has exceeded (because of aggregating the previous prompt & response pairs.
So what’s happening at the backend? Is the master_context got truncated?
Looking forward to it.
Cheers!