I have observed a quite strange behavior of the realtime API: sometimes it misunderstands completely what the user says. When the user says “Yes.”, it understood as a “No.”. This happens a lot with French, and also with English but less frequent.
I’m not talking about bad transcripts (the transcripts with whisper-1
are quite good in general) but really the understanding of the realtime assistant.
A typical example is the following:
Assistant: “Hello. I’m… Would you be interested?”
User: “Yes.”
Assistant: “Please let us know when you change your mind. Have a great day!”
And I feel that this happens more often with short transcripts such as “Yes”, “Oui” (in French). If I say “Yes of course”, then it understands correctly.
Has anybody observed this issue as well and have you been able to find a workaround?
Thanks a lot in advance!