Realtime API: Assistant misunderstands even simple sentences

I have observed a quite strange behavior of the realtime API: sometimes it misunderstands completely what the user says. When the user says “Yes.”, it understood as a “No.”. This happens a lot with French, and also with English but less frequent.

I’m not talking about bad transcripts (the transcripts with whisper-1 are quite good in general) but really the understanding of the realtime assistant.

A typical example is the following:

Assistant: “Hello. I’m… Would you be interested?”
User: “Yes.”
Assistant: “Please let us know when you change your mind. Have a great day!”

And I feel that this happens more often with short transcripts such as “Yes”, “Oui” (in French). If I say “Yes of course”, then it understands correctly.

Has anybody observed this issue as well and have you been able to find a workaround?

Thanks a lot in advance!

Does anybody have an idea on this please?