Trouble mapping realtime speech to function call text

nsteb1993 · January 27, 2025, 6:29pm

Hey there,

Right now I’m using the Realtime API to call simple functions intelligently. I used the Twilio + OpenAI app as a template.

I’ve noticed that for functions with multiple inputs that the Realtime API often misses a digit or two of an input. The problem is that when I go to speak the input again the Realtime API often stubbornly sticks to its original input.

I output conversation.item.input_audio_transcription.completed events and it’s clear that it does hear what I’m saying but that for some reason doesn’t directly make it into the function call.

Has anyone else had this same issue?

For example here’s one of my functions registered with OpenAI:

def opt_out_sponsor(
    ein: str,
    reason: ValidReason,
    first_name: str,
    last_name: str,
    plan_name: str,
    plan_phone: str,
) -> str:
    """
    Call this if the user asks to opt out and please confirm EIN and read it back to the user for confirmation.

    Args:
        ein (str): Employer Identification Number
        reason (ValidReason): The existing plan type that the user has and therefore their reason for opting out.
        first_name (str): The first name of the main contact for the sponsor.
        last_name (str): The last name of the main contact for the sponsor.
        plan_name (str): The name of the existing plan.
        plan_phone (str): The phone number of the existing plan.

    Returns:
        str: The response to be read to the user.
    """
    # does some API calls and returns "Say this verbatim: I'm sorry but I had trouble finalizing the exemption. Please try again later." if there's an error

It’s worth noting that OpenAI Realtime API seems to excel with simple names like “Michael” and “John” but an EIN is trickier. An EIN is usually in the form 00-0000000. By the way, the dash isn’t what I’m discussing here as I can always filter the dash out with Python. It’s that the EIN is often wrong on the function call even though the text transcription is great.

Macha · January 27, 2025, 7:35pm

There might be a few options I would consider initially here.

However, what exactly is your function call schema / definition? As in the actual JSON schema you’re giving the OpenAI model?

https://platform.openai.com/docs/guides/function-calling#defining-functions

I would first isolate the EIN request to it’s own function call separate from the other stuff you’re trying to do so we can isolate it and figure out how we could tweak the tool to hopefully have it produce the intended output. If the text transcription is good, it might be worth just manually feeding that transcription back into your own function directly.

Topic		Replies	Views
Realtime API poor speech recognition twilio -> OpenAI API realtime	9	524	January 29, 2025
Realtime api not understand phone number API realtime	14	848	January 23, 2025
Random Time Slots in Realtime Function Call Bugs realtime	3	50	January 29, 2025
Realtime API Gets Names Horribly Wrong API realtime	10	580	February 23, 2025
Why is realtime model so bad at understanding sequences of numbers? API realtime	15	1018	February 4, 2025

Trouble mapping realtime speech to function call text

Related topics