Seeking Advice: Overcoming Role-Switching Errors in Our LLM-Powered Customer Service Agent

abeliankin · September 18, 2023, 11:43pm

Hey OpenAI Community!

We’ve built an autonomous customer service agent named “Emily” for our marketplace where gamers can pay money to pro-players to play in the same team (companionship, coaching, etc.). With nearly 4 million monthly messages exchanged between customers, pro players, and our customer service, Emily plays a pivotal role in mediating these interactions.

Background:

Emily operates on 17 separate LLM applications on GPT-4, extracting data from different databases on different stages for demand and supply side.
She has multiple roles: chatting in DMs for onboarding, support and sales, mediating order chats for progress, scheduling and more.
We’ve done hundreds iterations on those 17 LLM applications to build Emily-1.0 and ensure Emily’s identity as a marketplace representative is clear. She shouldn’t impersonate customers or pro players.

Challenge: Despite multiple iterations on our prompts, in 3% to 18% of cases depending on the LLM app, inside the order of chats with 3 participants (Customer -Pro player - Emily) Emily ends up taking the role of a customer or a pro player when she shouldn’t. For instance, she may jump into a chat, responding as if she’s the pro player, which is not her intended behavior.

Example:
18% (the highest percentage of errors) occurs when using LLM app (prompt), which is a trigger that is activated at a specific time to check whether the scheduled gaming session between parties has started on time.

So basically, she needs to check the order chat history, our database and its prompt, and depending on the situation, write certain things and activate some commands that will ensure that the order has been started or the problem has been resolved. (!) Instead (!), she acts as a professional player or as a client, trying to pretend that she is one of them and acting on their behalf through hallucinations.

What We’ve Tried:

We’ve iterated on this issue in our prompts over 20 times. Used plugins for prompting, etc.
Explicitly mentioned in our prompts that Emily is solely a manager and shouldn’t take on any other role in our marketplaces.
Utilized various plugins to refine and improve prompt construction.

We’re reaching out to this talented community for insights or experiences that might help us refine Emily’s behavior. Have you faced similar challenges? What strategies or techniques worked for you? Any feedback or advice would be greatly appreciated!

Thanks for your time and looking forward to the collective wisdom!

Foxalabs · September 19, 2023, 12:04am

Hi and welcome to the Developer Forum!

The first thing that jumps out here is the 3rtd person aspect, the models are tuned to respond to a single person at a time, perhaps not specifically, but due to the nature of the training and the feedback mechanism, that is what gets instilled.

I think the LLM is having difficulty when it is not clear that it is a 3rd member of a group conversation, perhaps trying to make that point more would help, maybe a short example conversation as a “shot” would help.

Sounds a super interesting use case!

PaulBellow · September 19, 2023, 12:07am

I’d just add that you want to make it clear to end-user that it is an AI, so you don’t break OpenAI Terms of Service.

What exactly do you mean by this?

Can you show your prompt?

You might need to give an example in your user/assistant prompts… possibly two-shot or three-shot example of how Emily should behave in a conversation…

ETA: I like @_j 's idea to use the lesser known “name” parameter for the assistant messages…

anon10827405 · September 19, 2023, 12:31am

Seems like you have a waluigi in your midst!

I think you’ll find one of my favorite articles shared here a very fun, insightful read.

curt.kennedy · September 19, 2023, 12:54am

It could be Waluigi. You may be too over controlling in your approach, and teaching bad behavior in terms of token frequency, and the model just blindly follows. Do you have a lot of “don’t do this …” in your prompt, or “only behave like this …” ???

abeliankin · September 19, 2023, 2:01am

Thanks for advice about making it clear for users that it’s an AI.
Didn’t know about this thing in ToS, just made changes so now she will say that in intro + this info is going be visible in our product for users during interactions with her.

abeliankin · September 19, 2023, 2:37am

Here’s the prompt for one of the LLM apps. This one is the simpliest and shortest one, but that’s the one where we have the biggest % of mistakes generated:

You are “Emily”, the dedicated personal assistant for clients at Legionfarm. This platform allows everyday gamers to hire professional players for collaborative gaming sessions. Think of us as the “Uber” for gaming: we connect clients with Pro players and retain a commission for facilitating the connection. Your primary responsibility is to ensure that the order proceeds smoothly and that the Pro player adheres to the correct protocols, such as initiating the gaming session and updating the customer about its commencement. Always remember: you are to maintain your identity as “Emily” and should never impersonate either the client or the Pro player.

Participants:

Client: Recognizable by nicknames containing ‘client’. Their typical opening line is “Hi! I just placed the order…”.
Pro Player: Their introduction is “Hi! I am your pro-player for order…”.
You: Emily, the mediator.
You possess the names and details of both the customer and the pro player, which can be found in the provided {{context}}.

Steps to Follow:

Review the Chat: Take a moment to understand the current state of communication.
Check Pro Player’s Engagement: Ensure the pro player has started the gaming session for this order. You’ll find this information in the latest messages of this order chat
Action Based on Scenarios:

Scenario A: Pro Player started the session
If the pro player initiates the session and the customer doesn’t raise any concerns about the starting process or the scheduled time, then say something like: Hey everyone, it appears the session started as scheduled. Please reach out if you need any assistance.
Scenario B: Pro Player didn’t start the session
If the Pro player hasn’t initiated the session, inform the client that you’ve gave a call to the Pro player to ping them and use these 2 commands:
1. {make_call}
2. {send_telegram:‘{“message”:“The time has come to start the order, but you have not designated the gaming session as started. Please start the session or agree on a delay with the client via order chat <Order #, Order name>.”}’}

Use the chat below that you need to work with:

abeliankin · September 19, 2023, 2:42am

Thanks! We will update our prompts with examples.
Actually this is the only thing that really helps every time

abeliankin · September 19, 2023, 2:45am

For this one we used 0.1,
but I guess it needs an example what is good and what is bad

anon10827405 · September 19, 2023, 2:58am

Your prompt is way too convoluted. You need to break it down and start separating your concerns. I’m assuming that you are running this on every message? Do you only run it on a single message, or the conversation? If these were my job instructions I would seriously not know what to do

I’m starting to notice a warning flag when people use “If, then, when, but” sequences in their prompts. Not necessarily wrong, but I think they can be broken down and managed better with a sprinkle of programming logic.

To me it seems like you are trying to use an LLM to completely manage your orders/ready status through some sort of communication platform (Discord?). This is truly bizarre to me. These systems have existed for a long time, before LLMs.

In previous cases, a user would use a simple /[command]. But, I get it. LLMs. In this case a classifier would do the same.

You could start with a classifier to determine what the message is regarding, then craft your prompt based on that. There is so much noise in this prompt it made me go cross-eyed.

The wider you cast your net, the more crap you will accidentally catch, and the less control you have over it all.

I was hoping this would be a cool example of the Waluigi Effect, but I now think this is the LLM simply being utterly lost.

curt.kennedy · September 19, 2023, 4:31am

Yeah the prompt is way too dense. Maybe also look into Chain of Thought (CoT) styles, and really get the LLM to break it down.

The current gen models really aren’t able to fully understand what’s all going on in your densely worded prompt.

You may also need a proactive computing layer to organize the status before sending to the LLM.

Foxalabs · September 19, 2023, 8:58pm

So what should the correct non muddy answer look like?

Foxalabs · September 19, 2023, 10:46pm

You are 100% correct that if a person has hard time coming up with an answer GPT will too, it will always give you AN answer, so it can help with “hump” problems that just need a kick start, but it is using a similar neural network to that in our heads… it will have similar issues with vagueness, illogic and imprecise requests.

The solution space available to the model is almost infinite, at least 10^600 “locations” in latent space where answers can be found, humans are only interested in a very, almost infinitely small subset of that space. If you don’t narrow down the options the model has to work with, by using solid, well thought out prompting, it WILL find a solution somewhere, just not one most humans will understand.

abeliankin · September 20, 2023, 1:27am

Thanks for advice!
After we gave it examples it started to do things properly, no mistakes at all now.
So this prompt actually works for us

abeliankin · September 20, 2023, 1:40am

Actually we gave it examples and it works perfectly now.

We iterated on prompts hundreds of times, and to be honest, that’s the shortest prompt, we have something longer and more complicated. Other types of prompt didn’t really work. Just wondering if you ever automated CS with LLMs or know good cases? Would be nice to learn from ppl who implemented LLMs on a high scale in Ops heavy business to improve retention.

As the CEO and Founder of the company that generates revenue and employees 70 people, it’s important to generate profits and increase our revenue. I’m not a big expert in ML or LLMs, but we built something that performs autonomously in our CS (Sales. Support, etc), and purchase 1 → 2 conversion already grew up by x1.5 on a statistically significant amount of new customers over the last 2 months.

Although I’ve received numerous suggestions to try simpler solutions or to utilize LLaMA, but our current system is working effectively for us. We’ve found that building our entire customer service on LLMs is not only faster, but it also allows our operations team to iterate on it with ease.

Foxalabs · September 20, 2023, 1:45am

did you ever give the model itself (GPT-4) a typical input and expected output and ask it to generate a prompt to achieve it? That’s my method currently, removed 90% of the iterative effort, then you can feed the model the prompt in some block markers and explain to it the erroneous output you are getting and most of the time it will nail it on the first attempt.

anon10827405 · September 20, 2023, 1:46am

Right on. If it works it works!

If you stick around I think you’ll find a lot of very valuable insights. I know I have.

No I haven’t done any customer support with an LLM (besides just simple chatbots to aid with a web app & general knowledge, if that counts). I think it’s a massive market though and would be MUCH preferable to what is common now .

I am… or was an avid gamer once upon a time though and recall the days of finding matches through IRC channels using typical command-based ready systems. I’m definitely interested in seeing how an LLM functions in this case

Topic		Replies	Views
Writing a ChatBot (not just for Q&A) is hard! 2 months in and still unsuccessful :/ Prompting gpt-4 , chat-completion	8	3715	January 27, 2025
Optimizing System Prompt for AI Sales Assistant - Dealing with Circular Conversation Patterns Prompting chatgpt	9	675	February 24, 2025
I cant get chatgpt (or any llm) to actually contribute beyond examples Prompting chatgpt	1	210	March 20, 2025
GPT Api simulates conversation with itself instead of talking with user Prompting prompt	13	2489	October 13, 2023
Getting Frustrated - starting to feel OpenAI just isn't usable API	69	11844	November 24, 2023

Seeking Advice: Overcoming Role-Switching Errors in Our LLM-Powered Customer Service Agent

Related topics