I have begun to see this challenge in a little different light. Perhaps I’m just getting my sea-legs, but the aha moments are coming fast now.
The goal is to ensure the AI entity has at least a short-term memory, a context that helps users avoid needing to repeat themselves. This is more about UX than AI, per se. My experience is that memory begins to degrade as soon as you are forced to chop the conversation at some point because of token or cost limitations.
To overcome this, I have used this approach with pretty good results. I would love others to lend their thoughts about this.
The user asks:
How many reservations have been sold to date?
The very next question provides no context except the time frame:
How many in 2021?
The process can be achieved with separate calls to the API; one to update the context and another to perform the latest query with that context. I have also experimented with a single inference that updates the context and addresses the latest query without revealing the updated context in the response.
Lastly, I have also seen this dialog unfold without providing context from message to message. I think it’s made possible for three reasons:
- The “model” corpus is based on embeddings and serves as a closed environment.
- The AI has no room to drift; the context is pre-established and logically concludes what the user is referring to.
- The embedding vector matching is tight; it has little room for misunderstandings.