Determining if the user has changed a subject

NSY · August 25, 2021, 6:58pm

Hi Guys,

I’m working on a personal assistant project.
After quite a bit of work, I’ve built a fine-tuned model capable of handling dozens of different tasks in a pretty nice way (labeling data, classifying, summarizing, etc.).
However, I have one major problem with handling previous information.

Let’s assume two use cases:

Interacting with a model for product technical assistance.
Instructing the model for data labeling (the user says “call mom,” and the model should label that as a “phone function” to the contact “mom”).

For use case #1, we need to keep the prompt’s past interactions - a user is asking for help, the model asks for clarification, the user replies, and the model provides the final answer.

For use case #2, we need the model to “forget” everything so the previous interactions will not affect the completion. For example, the user instructs the model to make a phone call, and the model is supposed to label the information for further usage. If the model will “remember” the previous information, it might label the wrong name or make other kinds of errors.

On the other side, if the user asks for help, the model asks for clarification, but when the user replies, the model doesn’t understand the context because we cleared the prompt, we are in another problem.

So far, my approach was to use classification to determine whether the user has switched to a new topic or not, to determine whether to clear the prompt. But it turned out that it can only partially help since short answers by the user (yes, I agree, do it…) might be falsely classified as another topic and vice versa.

I’ll be happy to hear any ideas on effectively handling this conflict to determine when to clear the prompt and keep the previous information and when to keep it, or if you have other suggestions for the architecture.
One related question is, can we affect the model’s probability to rely on the prompt versus the stored fine-tuned information?
Note: I have to use temperature 0 to avoid providing the user with misinformation.

NSY · August 26, 2021, 4:45am

The issue is more technical than semantical.
I’ll explain with two different tasks - “set a reminder” and “set a meeting.”
The two commands should result in a different reply - one contained time and subject, and the second contains participants, location, and more.
After fine-tuning, both work well on a clear prompt.
But if the user just set a reminder and now tries to set a meeting, the model gets confused and sets a reminder again instead of a meeting.
So we need a method to determine when to clear the prompt and when not to clear it (or any other way to prevent confusion).
How would you do that?

NSY · August 26, 2021, 5:40am

Exactly, unless the model asks for clarification, for example, “who would participate?”
For that you need to keep the previous prompts and completions.
But that of course can cause confusion if the user DO change the subject.
I’m starting to think that I’ll need a separated model just to analyze if the conversation is on track.

vertinski · August 26, 2021, 12:40pm

This is more like it! @daveshapautomator has a book on this area of research.

NSY · August 26, 2021, 1:20pm

Haha, I’m just at the beginning of the book. Can you point me to the right page?
EDIT: ok, I’m reading, and it seems that evaluation is an important part, although in this case, it seems more like a second ‘intelligence’ is evaluating the first one, rather than internal evaluation. Anyway, @daveshapautomator can you please suggest a prompt to evaluate an existing prompt?

daveshapautomator · August 26, 2021, 1:53pm

Oh man, y’all stepped on what is going to be one of the hardest problems in creating AGI! Basically, the entire human brain is recruited for these kinds of attention tasks, so our first forays are going to be extremely sloppy. I’m sure some computer scientists will come up with better models eventually but right now this is the Wild West.

In his book On Task David Badre talks extensively about how our brains use cognitive control to handle multiple tasks, track them to completion, and so on. Really you only need to read the first two or three chapters to see how complex this is as a problem.

Anyways, I’ve been putting thought into how to handle task switching and specifically, carrying tasks through to completion. I would caution against brute force methods. The reason is that you are going to build a custom solution that will only work for one particular architecture and then break as soon as you make other structural changes. In effect, this means that I recommend you create a solution that can engage with an arbitrary data source, such as a SQL database or search index like ElasticSearch or SOLR.

It just so happens I was reading up on txtai, which should aid us greatly in this problem.

Let us assume that you have a database of all past user interactions in something like SQLITE or ElasticSearch. Every human could handle the query “Call mom” if they are talking with someone who they are close with. Say a spouse or a sibling.

First you have to establish intention - I recommend establishing intent with every new message (be it chat, verbal, etc). This is the first method of detecting changes in subject/task. I have some intent prompt examples in the Appendix of my book. So with a good intent extractor you’ll get something like End user NSY wants me to call their mother

This function begs a few questions and must be mapped to a few information problems:

Who is NSY’s mother?
What is NSY’s mother’s phone number?

Fortunately for you, I just wrote a fine-tuned question generator that can assist with this exact kind of problem! (internal questions are one way of articulating what our brains do when engaging with tasks, also, curiosity)

You must then use semantic search over your data to answer those questions. If the information isn’t available, you must generate a dialog to ask the end user. This method of breaking tasks into constituent information problems is basically how the brain works, and holds each problem in working memory.

Anyways, I’m working on a better Shared Database service as outlined in the book. I’ll post about it when I’m done. It should help with these kinds of problems immensely as it will have integrate semantic search, thus greatly enhancing the ability to answer questions.

EDIT: One thing I did wrong with the existing question asking dataset is that I didn’t include usernames. I will have to fix that one day.

NSY · August 26, 2021, 2:13pm

Wow, that’s a great reply. Thanks a lot @daveshapautomator !
It seems that I’m not so far from where you suggest I’ll go.
My fine-tuned model can usually respond with the right completion.
I implemented a semantic search filter to detect a subject change and clear the prompt when the subject has changed. The problem remains when a too short or a too vague user prompt may cause the semantic search to misclassify the intention and clear the prompt when it’s not needed or vice versa.
However, adding a middle step for the main model to analyze the user intention seems intriguing as it can break a pattern that caused the error (although it will result in more tokens). I’ll give it a try!

daveshapautomator · August 26, 2021, 2:29pm

Yeah so the Question Generator can help with vague queries by giving you the ability to ask follow-up and clarifying questions.

I had this idea that I wanted to make a chatbot that ONLY asks questions. I think such a think could still be quite useful.

NSY · August 26, 2021, 2:34pm

Do you think that such a “self assesment” task can be performed by a fine-tuned ada or babbage, or that only Curie or Davinci could do it?
Otherwise it will get quite costly as the completions will become far longer.

daveshapautomator · August 26, 2021, 6:46pm

I doubt it. CURIE seems to be the bottom end of what is capable of generating more open-ended output. I would expect BABBAGE to be able to handle more complex labeling tasks, but not necessarily generating salient questions. I will say, though, that a fine-tuned CURIE seems to outperform general purpose DAVINCI. So perhaps a fine-tuned BABBAGE could outperform a vanilla CURIE. For the cost savings it would certainly be worth trying. I might go back to my question generator and do a side by side. I guess I should have done that anyway.

glavin001 · March 1, 2023, 8:01am

@daveshapautomator Did you ever go back and try a Babbage version of the question generator? I stumbled upon your question generator project and curious how far it could go. Thanks!

NSY · March 28, 2023, 5:19pm

Hey @daveshapautomator , two years later plus GPT-4, what’s your take on that today?

Topic		Replies	Views
Fine Tuned Chatbot forgets how to output summary of conversation API	9	1834	December 18, 2023
What to do when fine-tuning is not working? API	21	8066	December 24, 2023
Train back and forth dialogues Prompting	14	1853	December 17, 2023
Turning chatgpt API into a assistant for a (complex) website API	20	4251	December 21, 2023
Generative Fiction - How to limit to certain size? Prompting	9	2221	January 31, 2024

Determining if the user has changed a subject

Related topics