Complex non-linear AI chat flows in gpt-3.5, how do you deal with them?

Lately, I’ve been working on a few clients’ projects where the scenario of communication is complex and non-linear.

Imaginary example (as I’m under NDA): the user is participating in a tech interview.

High-level process for our imaginary example:

  1. AI asks the question.
  2. Then the User has an option to ask clarifying questions (request_for_info branch) or go on to answer the question (answering_the_question branch). Both of the branches may consist of 1 or several back and forth messages:
  • in the request_for_info branch the user may ask 1 question in 1 message, or several questions broken down into several messages
  • same in the answering_the_question branch: the user may answer in 1 message or start answering in 1 message and continue in 1 or several more
  1. If the User went to request_for_info branch, once it’s finished, he is moved to the answering_the_question branch.
  2. Once the user has finished the answering_the_question, we move to feedback branch.
  3. For simplicity of the example, the feedback branch is just AI sending 1 feedback message.
  4. Also, for simplicity of the case, we are not using any kind of RAG here.

My process and thoughts on solving this

1. I started with an approach I called 1-prompt-flow. I basically explain the whole process in the prompt and ask AI to answer in json, so that not only it can monitor the process, but I can do this as well in my app… And it worked like magic. Except, it only worked in a stable manner in gpt-4, which is too slow and expensive for the projects I’m working on. So while it is an elegant solution, I had to dump it here.

2. Then I decided to offload part of the logic from AI to the app. Here it was already not 1 prompt, but several prompts, each responsible for different stages of the process, but it was still an answer to the User and the status information all in 1 json message.

It worked perfectly for 1 of the projects, but not for the others. The main challenge I faced here is moving between the steps. When outputting JSON, the model seems to spend too much of its attention on it, so that it starts to ignore other prompt instructions.

3. As the 3rd approach, I broke down getting an answer and the status of the process into 2 separate prompts for each of the stages. 1 prompt is answering and another is analyzing where we are now and if we should move to another stage and outputting json I can use as a logical traffic lights in my app. And this is where I encountered another problem. While this approach worked for some cases, it is impossible to get stable results for others.

Where the model trumps the most is when at a certain point we may get a deliberate choice from the user (“Do you want to request more info (yes) or answer the question (n)?”), but at the same time the user may go on directly asking their questions or answering the AI’s question. Even this can be handled. The problem comes where the answer to the question may include a question.

4. After all of this process I came to a conclusion that at least at certain points of the process I need to have buttons (“I want to know more”/“I want to answer the question”) as 3.5 family of models has some limitations on understanding User’s intent. Though it is a perfectly woking solution, I’m kinda cringed of it (we have the power of the digital mind at our fingertips, but still have to use such relict interface solutions).

So, my question is:

Champs, how do YOU deal with such situations both philosophically and practically?

1 Like

@cass @_j @Foxalabs curious to hear your thoughts

For the moment, I’m not! I understand what you are trying to build, and it will be the way things work in the future, but right now… the tech is not there, this really requires a model that can learn from each interaction and understand the complex requirements of balancing AI and human control in a conversation.

You have issues like feedback loops where clarification is required to ensure all of the required information is pulled in, and how to make the best use of traditional, proven, fast data input methods, text is often the worst possible system for data gathering, humans can be widely inaccurate when given a large option range, hence why data pickers limit the user so much and we have built all kinds of novel and actually useful interfaces for selecting options. Getting that right is a model generation or two away.

At this stage I think it best to drill down to what the user actually needs and use a hybrid of traditional UI/UX and make sparing, judicious use of AI where the input can not/should not be constrained.

Current systems lack context about the users life, when AI system have access to the persons “life feed” then they will require far less input as most of the requirements can be inferred from contextual cues, why is my human asking about sales records? Oh, yes, that email that just arrived is asking about sales records… I’ll go read the email and prepare the data so when my human does come and ask I’ll have it prepared the way it was done last time… etc, etc. We are just not there yet.

I think the key is to create a conversational flow that feels natural and intuitive to the user, the problem is a million humans will have a million ways of approaching the same task, you have to “herd” while there is still a ga
p between what we think the models can do and what they can actually do.

1 Like

Thank you, Spencer! So buttons it is. At least for now.

But still curious to hear what others think!

1 Like

I think a really good case for AI is for support bots that can determine if the users query is an edge case that lies outside of the 95th percental solution space offered by the knowledge base and passing that user off to a Level 2 support agent.

Also, you can save the user the time and effort of navigating the knowledge base by presenting the top_k solutions as options right away and iteratively refining those if non are suitable. I think that side of things has a real value add to offer companies, as Level 1 support is the majority cost in most setups.

2 Likes

Great use case @TonyAIChamp . I have some experience with similar work flows.

The only success I have had so far is mixing “classic” UI (buttons and flows) with a fleet of specific targeted threads in the GPT-3.5 model overseen by a “manager” GPT-4 model.

I would advise building that way for now. In six months to a year there likely will be a more sophisticated model that can remember multi step progress. Best guess.

1 Like

Yeah, it seems that in complex production apps (unless we are using fine-tuned models) this is the only way for now.

For your “manager” solution. I was thinking about it also, but in all of my cases GPT-4 gives to high time overhead to be used. Though I may think of some way of the manager working in parallel. Does yours work synchronously with the user flow?

2vy0x4

We built a system like this early this year, and the approach we had was to have multiple “subsystems” listening to the user conversation, each tasked with maintaining some internal state.

A pertinent aggregate of the internal state of each subsystem contributed to the prompt of the chat system in real time, so to speak (although we didn’t interrupt generation at the time).

some of the subsystems we had:

  • objective manager (figure out what’s been accomplished, and what still needs to happen, preloaded with a task list)
  • conversation steerer (basically guardrails additionally informed by active objectives)
  • analysis
  • and some other stuff

We were afraid that subsystem information would be available “too late” - because it would always be at least one response behind, but this didn’t turn out to be an issue. The most critical part I suppose was to ensure that the prompt was kept clean - and that’s what the subsystems accidentally excelled at.

That worked pretty well and absolutely delighted the users - particularly in how flexible it was. However, it was rather expensive (average ~$4 for a 15-30 minute conversation) - although we had a bug that might have contributed to a cost explosion at the time.

2 Likes

Niiice! Were you using 3.5 or 4? What was an average time to respond to the user’s message?

thanks :slight_smile:

according to the bill it seems like it was mostly gpt-4

I can’t find any pics of the subsystem control surface anymore, but every subsystem could specify its own model.

time to first response was practically zero because we used streaming, and because we decided to accept subsystem lag as mentioned before.

edit: I recall that first response wasn’t always zero - a serverless cold start would cause up to ~5 seconds delay, but during active use that wasn’t an issue. but that wasn’t an AI problem.

1 Like

Spent a few hours today writing that will help me develop such complex non-linear LLM apps faster and with less pain: GitHub - TonySimonovsky/AIConversationFlow

If you feel this is useless, don’t be shy to tell me (with some reasoning, for example: “there is actually library N that does that”) :slight_smile:

1 Like