Lately, I’ve been working on a few clients’ projects where the scenario of communication is complex and non-linear.
Imaginary example (as I’m under NDA): the user is participating in a tech interview.
High-level process for our imaginary example:
- AI asks the question.
- Then the User has an option to ask clarifying questions (
request_for_info
branch) or go on to answer the question (answering_the_question
branch). Both of the branches may consist of 1 or several back and forth messages:
- in the
request_for_info
branch the user may ask 1 question in 1 message, or several questions broken down into several messages - same in the
answering_the_question
branch: the user may answer in 1 message or start answering in 1 message and continue in 1 or several more
- If the User went to
request_for_info
branch, once it’s finished, he is moved to theanswering_the_question
branch. - Once the user has finished the
answering_the_question
, we move tofeedback
branch. - For simplicity of the example, the
feedback
branch is just AI sending 1 feedback message. - Also, for simplicity of the case, we are not using any kind of RAG here.
My process and thoughts on solving this
1. I started with an approach I called 1-prompt-flow. I basically explain the whole process in the prompt and ask AI to answer in json, so that not only it can monitor the process, but I can do this as well in my app… And it worked like magic. Except, it only worked in a stable manner in gpt-4, which is too slow and expensive for the projects I’m working on. So while it is an elegant solution, I had to dump it here.
2. Then I decided to offload part of the logic from AI to the app. Here it was already not 1 prompt, but several prompts, each responsible for different stages of the process, but it was still an answer to the User and the status information all in 1 json message.
It worked perfectly for 1 of the projects, but not for the others. The main challenge I faced here is moving between the steps. When outputting JSON, the model seems to spend too much of its attention on it, so that it starts to ignore other prompt instructions.
3. As the 3rd approach, I broke down getting an answer and the status of the process into 2 separate prompts for each of the stages. 1 prompt is answering and another is analyzing where we are now and if we should move to another stage and outputting json I can use as a logical traffic lights in my app. And this is where I encountered another problem. While this approach worked for some cases, it is impossible to get stable results for others.
Where the model trumps the most is when at a certain point we may get a deliberate choice from the user (“Do you want to request more info (yes) or answer the question (n)?”), but at the same time the user may go on directly asking their questions or answering the AI’s question. Even this can be handled. The problem comes where the answer to the question may include a question.
4. After all of this process I came to a conclusion that at least at certain points of the process I need to have buttons (“I want to know more”/“I want to answer the question”) as 3.5 family of models has some limitations on understanding User’s intent. Though it is a perfectly woking solution, I’m kinda cringed of it (we have the power of the digital mind at our fingertips, but still have to use such relict interface solutions).
So, my question is:
Champs, how do YOU deal with such situations both philosophically and practically?