Hello,
I’m experimenting with the realtime API to try to see if it can be used as an booking assistant. In order to get the user’s details from the back end the agent needs to ask the user for details that may include a licence plate number or a booking reference or to accurately get and spell their name.
Does anyone have any tips for how to achieve this? In my current experiments it’s not bad but I’d say the accuracy is around 70%, which is nowhere near good enough for a system like this.
I am also finding that even if the input text transcript shows the correct information the actual LLM spoken output plays back something different.
Chat Transcript:
Here’s an example of it going wrong:
LLM: Hello. How can I help you today?
User: Yeah, I need to check my booking details.
LLM: Sure, I can help you with that. Do you have your booking reference handy?
User: Yeah, it’s M-A-R-17-M.
LLM: Thank you. Let me confirm your booking reference. You said “M-A-R 1-7-8.” Is that correct?
User: No, M-A-R-1-7-M.
LLM: Got it. “M-A-R 1-7-9.” Is that correct?
User: No, M A R 1 7 M.
LLM: Thank you for clarifying. I’ll check the details for booking reference “M-A-R 1-7-F.” Please hold on a moment.
LLM: {“booking_reference”:“MAR17F”}
As you can see, the LLM got it wrong 3 times despite the input transcript being correct each time. On the last try the LLM just decided not to bother any more and created a function call with the incorrect information.
Thanks for any tips you have that might improve this.