Realtime api not understand phone number

yjqrevo · October 4, 2024, 2:33am

I tried realtime on playground and seems it can understand email, phone and etc pretty well. But when I tried api with twilio, it seems not understand these. Does anyone have similar issues? Curious why this happened. Any thoughts is appreciated!

stevenic · October 4, 2024, 2:45am

Can you share more details about what you’re trying to do? You said you’re accessing the API via Twillio?

yjqrevo · October 4, 2024, 9:50pm

Yeah, basically, I am following the tutorial by twilio. Using twilio and openai realtime api to build an AI voice assistant with socket.

chengqian.zh · October 16, 2024, 11:45pm

see issue here, any updates?

kshi · October 9, 2024, 11:59pm

Anyone else seeing this issue? And if so, how did you resolve?

Details:

API call is wrapped by twilio
Consistently gets names wrong during a call “I’m Saul” => “Hi, Ahmed” and does not change after correcting “I’m Saul” => “Ok Ahmed”
Repeats back phone numbers incorrectly (many digits off)
Our transcription service (Deepgram) detects details accurately so it doesn’t seem like an audio-quality issue

We’ve added this into our prompt to make the AI more receptive to corrections, although oftentimes it’ll respond with a different name or number:

If the user corrects a detail or information be sure to acknowledge the new information and use it going forward (i.e. name)
When taking phone numbers, if the caller says numbers or makes a correction, be sure to make the change

saul1 · October 10, 2024, 12:03am

i’m seeing the same class of issues. Speech to text transcribes correctly but voice-to-voice struggles.

Jordy · October 10, 2024, 10:35am

Also having the same issue with names - VtT recognises the name correctly (printed on logs), but TtV just gets the name completely wrong.

chengqian.zh · October 16, 2024, 11:48pm

same issue here. wrong phone number, address and name.

justin.w · October 23, 2024, 7:40pm

same class of issues here as well - surprisingly bad accuracy

any theories on what’s going on?

cm7 · November 10, 2024, 9:23am

have the same issue. anybody an idea how to mitigate this?

burak.ertuerk94 · November 10, 2024, 6:08pm

I got the same issue, the realtime api is unuseable due to this issue for me.

its crazy how fundamentally wrong it gets names, haven’t had that bad of a transcription with any tts model that i used until today.

I hope that there is an update or at least fix coming from OpenAI, my clients would love to get the realtime api integrated into their system, for them it’s not even the pricing, but the sheer unreliability in how the api transcribes names.

So a solution would be highly appreciated, since I would love to surprise them with the realtime agent working properly for their usecase.

JakeTheSnake · December 29, 2024, 4:10pm

Same issue, gets first and last names horribly wrong. Had one where “David Olijuler” was translater to “Jason King”. Havnt found a fix yet.

eramSK · January 21, 2025, 3:51pm

Same here - we have been experiencing the issues listed in this thread for months - our use case is creating an AI answering service

Cannot understand phone numbers (can’t accurately repeat them back)
Cannot understand spelled out email address (can’t accurately repeat back, hallucinates)
If the conversation goes on, starts addressing the caller by the wrong name

Strangely, when using the exact same prompt in another platform like Elevenlabs conversational, VAPI with GPT4o - we have no issues at all - no hallucinations.

Eliasb · January 21, 2025, 5:18pm

I tell you how I achieve a 40-60 accuracy here.

The chunk size I send for this particular client is 5k so no interruptions.

function prompt:

put your utmost attention to every single number. group them individually.
- Trailing zeros included (R4889800). For the API call we form the reference starting by R and placing numbers in front Eg. R4889800
- Use I am looking up the property for you" while CALL FUNCTION when you have heard the complete reference
- Once you get the reference, Repeat each number ACURATELY without inventing to the client before launching the query to confirm you got it right
- SPEAK IMMEDIATELY when you get API response
- Describe property briefly
- If reference is not correct,cleanup and restart flow with new reference

And this: USE YOUR BEST TEMPERATURE to hear and repeat the numbers without mistake.
but no clue if it actually does something.

I’ve set default temp to 0.6

mikado · January 23, 2025, 10:44am

Hey,

I’m playing around with a potential solution. Still in development but might help you out.

Since the whisper user transcript is getting the name correctly, but the issue is with the understanding of the realtime API, I created a tool and told it to run it whenever the user spells out its name.

So what I did, is add the following tool:

realtimeClient.addTool(
    {
        name: 'getSpelledName',
        description: 'Fetches the last saved user transcript containing a name spelled by the user.',
        parameters: {
            type: 'object',
            properties: {
                transcript: {
                    type: 'string',
                    description: 'The user transcript containing the spelled name.',
                },
            },
            required: ['transcript'],
        },
    },
    async ({ lastUserTranscript }) => {
        return { lastUserTranscript };
    }
);

This is how I save the user transcript for the tool:

realtimeClient.on('conversation.updated', ({ item, delta }) => {
    if (item.type === 'message' && item.role === 'user' && item.formatted.transcript) {
        lastUserTranscript = item.formatted.transcript;
    }
}

And this is how I tell the assistant to call this tool whenever the customer spells his name:

# Instructions
- Whatever the question of the user is, always start by asking for the full name and birthdate of the person.
- Always ask the user to spell out his last name.
- Whenever the user spells their name, call the "getSpelledName" tool to retrieve it.

Hope this helps !

Topic		Replies	Views
Why is realtime model so bad at understanding sequences of numbers? API realtime	15	1020	February 4, 2025
Realtime API Gets Names Horribly Wrong API realtime	10	582	February 23, 2025
Realtime API poor speech recognition twilio -> OpenAI API realtime	9	524	January 29, 2025
Realtime api passing incorrect information to the functions Bugs functions , realtime	9	437	December 10, 2024
How to hardcode a response based on the words spoken by the user? API realtime	4	137	January 23, 2025

Realtime api not understand phone number

Related topics