Determining if the user has changed a subject

Hi Guys,

I’m working on a personal assistant project.
After quite a bit of work, I’ve built a fine-tuned model capable of handling dozens of different tasks in a pretty nice way (labeling data, classifying, summarizing, etc.).
However, I have one major problem with handling previous information.

Let’s assume two use cases:

  1. Interacting with a model for product technical assistance.
  2. Instructing the model for data labeling (the user says “call mom,” and the model should label that as a “phone function” to the contact “mom”).

For use case #1, we need to keep the prompt’s past interactions - a user is asking for help, the model asks for clarification, the user replies, and the model provides the final answer.

For use case #2, we need the model to “forget” everything so the previous interactions will not affect the completion. For example, the user instructs the model to make a phone call, and the model is supposed to label the information for further usage. If the model will “remember” the previous information, it might label the wrong name or make other kinds of errors.

On the other side, if the user asks for help, the model asks for clarification, but when the user replies, the model doesn’t understand the context because we cleared the prompt, we are in another problem.

So far, my approach was to use classification to determine whether the user has switched to a new topic or not, to determine whether to clear the prompt. But it turned out that it can only partially help since short answers by the user (yes, I agree, do it…) might be falsely classified as another topic and vice versa.

I’ll be happy to hear any ideas on effectively handling this conflict to determine when to clear the prompt and keep the previous information and when to keep it, or if you have other suggestions for the architecture.
One related question is, can we affect the model’s probability to rely on the prompt versus the stored fine-tuned information?
Note: I have to use temperature 0 to avoid providing the user with misinformation.

4 Likes

The issue is more technical than semantical.
I’ll explain with two different tasks - “set a reminder” and “set a meeting.”
The two commands should result in a different reply - one contained time and subject, and the second contains participants, location, and more.
After fine-tuning, both work well on a clear prompt.
But if the user just set a reminder and now tries to set a meeting, the model gets confused and sets a reminder again instead of a meeting.
So we need a method to determine when to clear the prompt and when not to clear it (or any other way to prevent confusion).
How would you do that?

1 Like

Exactly, unless the model asks for clarification, for example, “who would participate?”
For that you need to keep the previous prompts and completions.
But that of course can cause confusion if the user DO change the subject.
I’m starting to think that I’ll need a separated model just to analyze if the conversation is on track.

3 Likes

This is more like it! @daveshapautomator has a book on this area of research.

3 Likes

Haha, I’m just at the beginning of the book. Can you point me to the right page?
EDIT: ok, I’m reading, and it seems that evaluation is an important part, although in this case, it seems more like a second ‘intelligence’ is evaluating the first one, rather than internal evaluation. Anyway, @daveshapautomator can you please suggest a prompt to evaluate an existing prompt?

3 Likes

Oh man, y’all stepped on what is going to be one of the hardest problems in creating AGI! Basically, the entire human brain is recruited for these kinds of attention tasks, so our first forays are going to be extremely sloppy. I’m sure some computer scientists will come up with better models eventually but right now this is the Wild West.

In his book On Task David Badre talks extensively about how our brains use cognitive control to handle multiple tasks, track them to completion, and so on. Really you only need to read the first two or three chapters to see how complex this is as a problem.

Anyways, I’ve been putting thought into how to handle task switching and specifically, carrying tasks through to completion. I would caution against brute force methods. The reason is that you are going to build a custom solution that will only work for one particular architecture and then break as soon as you make other structural changes. In effect, this means that I recommend you create a solution that can engage with an arbitrary data source, such as a SQL database or search index like ElasticSearch or SOLR.

It just so happens I was reading up on txtai, which should aid us greatly in this problem.

Let us assume that you have a database of all past user interactions in something like SQLITE or ElasticSearch. Every human could handle the query “Call mom” if they are talking with someone who they are close with. Say a spouse or a sibling.

First you have to establish intention - I recommend establishing intent with every new message (be it chat, verbal, etc). This is the first method of detecting changes in subject/task. I have some intent prompt examples in the Appendix of my book. So with a good intent extractor you’ll get something like End user NSY wants me to call their mother

This function begs a few questions and must be mapped to a few information problems:

  1. Who is NSY’s mother?
  2. What is NSY’s mother’s phone number?

Fortunately for you, I just wrote a fine-tuned question generator that can assist with this exact kind of problem! (internal questions are one way of articulating what our brains do when engaging with tasks, also, curiosity)

You must then use semantic search over your data to answer those questions. If the information isn’t available, you must generate a dialog to ask the end user. This method of breaking tasks into constituent information problems is basically how the brain works, and holds each problem in working memory.

Anyways, I’m working on a better Shared Database service as outlined in the book. I’ll post about it when I’m done. It should help with these kinds of problems immensely as it will have integrate semantic search, thus greatly enhancing the ability to answer questions.

EDIT: One thing I did wrong with the existing question asking dataset is that I didn’t include usernames. I will have to fix that one day.

4 Likes

Wow, that’s a great reply. Thanks a lot @daveshapautomator !
It seems that I’m not so far from where you suggest I’ll go.
My fine-tuned model can usually respond with the right completion.
I implemented a semantic search filter to detect a subject change and clear the prompt when the subject has changed. The problem remains when a too short or a too vague user prompt may cause the semantic search to misclassify the intention and clear the prompt when it’s not needed or vice versa.
However, adding a middle step for the main model to analyze the user intention seems intriguing as it can break a pattern that caused the error (although it will result in more tokens). I’ll give it a try!

2 Likes

Yeah so the Question Generator can help with vague queries by giving you the ability to ask follow-up and clarifying questions.

I had this idea that I wanted to make a chatbot that ONLY asks questions. I think such a think could still be quite useful.

2 Likes

Do you think that such a “self assesment” task can be performed by a fine-tuned ada or babbage, or that only Curie or Davinci could do it?
Otherwise it will get quite costly as the completions will become far longer.

I doubt it. CURIE seems to be the bottom end of what is capable of generating more open-ended output. I would expect BABBAGE to be able to handle more complex labeling tasks, but not necessarily generating salient questions. I will say, though, that a fine-tuned CURIE seems to outperform general purpose DAVINCI. So perhaps a fine-tuned BABBAGE could outperform a vanilla CURIE. For the cost savings it would certainly be worth trying. I might go back to my question generator and do a side by side. I guess I should have done that anyway.

2 Likes

@daveshapautomator Did you ever go back and try a Babbage version of the question generator? I stumbled upon your question generator project and curious how far it could go. Thanks!

1 Like

Hey @daveshapautomator , two years later plus GPT-4, what’s your take on that today?

1 Like