When testing the Realtime recent mode in Playground, the text that is transcribed from my voice input is incorrect, although gpt-4o still understands what I have said.
Why? Do they add in the background a speech-to-text on the user voice input to transcribe it and show in the message history?