Realtime api not understand phone number

I tried realtime on playground and seems it can understand email, phone and etc pretty well. But when I tried api with twilio, it seems not understand these. Does anyone have similar issues? Curious why this happened. Any thoughts is appreciated!

2 Likes

Can you share more details about what you’re trying to do? You said you’re accessing the API via Twillio?

1 Like

Yeah, basically, I am following the tutorial by twilio. Using twilio and openai realtime api to build an AI voice assistant with socket.

see issue here, any updates?

Anyone else seeing this issue? And if so, how did you resolve?

Details:

  • API call is wrapped by twilio
  • Consistently gets names wrong during a call “I’m Saul” => “Hi, Ahmed” and does not change after correcting “I’m Saul” => “Ok Ahmed”
  • Repeats back phone numbers incorrectly (many digits off)
  • Our transcription service (Deepgram) detects details accurately so it doesn’t seem like an audio-quality issue

We’ve added this into our prompt to make the AI more receptive to corrections, although oftentimes it’ll respond with a different name or number:

  • If the user corrects a detail or information be sure to acknowledge the new information and use it going forward (i.e. name)
  • When taking phone numbers, if the caller says numbers or makes a correction, be sure to make the change
2 Likes

i’m seeing the same class of issues. Speech to text transcribes correctly but voice-to-voice struggles.

Also having the same issue with names - VtT recognises the name correctly (printed on logs), but TtV just gets the name completely wrong.

same issue here. wrong phone number, address and name.

same class of issues here as well - surprisingly bad accuracy

any theories on what’s going on?

have the same issue. anybody an idea how to mitigate this?

I got the same issue, the realtime api is unuseable due to this issue for me.

its crazy how fundamentally wrong it gets names, haven’t had that bad of a transcription with any tts model that i used until today.

I hope that there is an update or at least fix coming from OpenAI, my clients would love to get the realtime api integrated into their system, for them it’s not even the pricing, but the sheer unreliability in how the api transcribes names.

So a solution would be highly appreciated, since I would love to surprise them with the realtime agent working properly for their usecase.

Same issue, gets first and last names horribly wrong. Had one where “David Olijuler” was translater to “Jason King”. Havnt found a fix yet.

Same here - we have been experiencing the issues listed in this thread for months - our use case is creating an AI answering service

  • Cannot understand phone numbers (can’t accurately repeat them back)
  • Cannot understand spelled out email address (can’t accurately repeat back, hallucinates)
  • If the conversation goes on, starts addressing the caller by the wrong name

Strangely, when using the exact same prompt in another platform like Elevenlabs conversational, VAPI with GPT4o - we have no issues at all - no hallucinations.

I tell you how I achieve a 40-60 accuracy here.

The chunk size I send for this particular client is 5k so no interruptions.

function prompt:

  • put your utmost attention to every single number. group them individually.
    - Trailing zeros included (R4889800). For the API call we form the reference starting by R and placing numbers in front Eg. R4889800
    - Use I am looking up the property for you" while CALL FUNCTION when you have heard the complete reference
    - Once you get the reference, Repeat each number ACURATELY without inventing to the client before launching the query to confirm you got it right
    - SPEAK IMMEDIATELY when you get API response
    - Describe property briefly
    - If reference is not correct,cleanup and restart flow with new reference

And this: USE YOUR BEST TEMPERATURE to hear and repeat the numbers without mistake.
but no clue if it actually does something.

I’ve set default temp to 0.6

Hey,

I’m playing around with a potential solution. Still in development but might help you out.

Since the whisper user transcript is getting the name correctly, but the issue is with the understanding of the realtime API, I created a tool and told it to run it whenever the user spells out its name.

So what I did, is add the following tool:

realtimeClient.addTool(
    {
        name: 'getSpelledName',
        description: 'Fetches the last saved user transcript containing a name spelled by the user.',
        parameters: {
            type: 'object',
            properties: {
                transcript: {
                    type: 'string',
                    description: 'The user transcript containing the spelled name.',
                },
            },
            required: ['transcript'],
        },
    },
    async ({ lastUserTranscript }) => {
        return { lastUserTranscript };
    }
);

This is how I save the user transcript for the tool:

realtimeClient.on('conversation.updated', ({ item, delta }) => {
    if (item.type === 'message' && item.role === 'user' && item.formatted.transcript) {
        lastUserTranscript = item.formatted.transcript;
    }
}

And this is how I tell the assistant to call this tool whenever the customer spells his name:

# Instructions
- Whatever the question of the user is, always start by asking for the full name and birthdate of the person.
- Always ask the user to spell out his last name.
- Whenever the user spells their name, call the "getSpelledName" tool to retrieve it.

Hope this helps !