Hey there,
Right now I’m using the Realtime API to call simple functions intelligently. I used the Twilio + OpenAI app as a template.
I’ve noticed that for functions with multiple inputs that the Realtime API often misses a digit or two of an input. The problem is that when I go to speak the input again the Realtime API often stubbornly sticks to its original input.
I output conversation.item.input_audio_transcription.completed
events and it’s clear that it does hear what I’m saying but that for some reason doesn’t directly make it into the function call.
Has anyone else had this same issue?
For example here’s one of my functions registered with OpenAI:
def opt_out_sponsor(
ein: str,
reason: ValidReason,
first_name: str,
last_name: str,
plan_name: str,
plan_phone: str,
) -> str:
"""
Call this if the user asks to opt out and please confirm EIN and read it back to the user for confirmation.
Args:
ein (str): Employer Identification Number
reason (ValidReason): The existing plan type that the user has and therefore their reason for opting out.
first_name (str): The first name of the main contact for the sponsor.
last_name (str): The last name of the main contact for the sponsor.
plan_name (str): The name of the existing plan.
plan_phone (str): The phone number of the existing plan.
Returns:
str: The response to be read to the user.
"""
# does some API calls and returns "Say this verbatim: I'm sorry but I had trouble finalizing the exemption. Please try again later." if there's an error
It’s worth noting that OpenAI Realtime API seems to excel with simple names like “Michael” and “John” but an EIN is trickier. An EIN is usually in the form 00-0000000. By the way, the dash isn’t what I’m discussing here as I can always filter the dash out with Python. It’s that the EIN is often wrong on the function call even though the text transcription is great.