Any of you experienced wrong transcription when input_audio_transcription when model whisper-1?
my clients complaint about the realtime is not understanding their meaning and sometimes giving them wrong answer . when i checked the logs, the reason was wrong transcription from whisper-1.
What language are you speaking
English,
Do you think itâs because of not being native speakers?
It could be. I donât know.
One thing to note is that the transcription is not what the RealTime API model actually heard. It is voice-to-voice and therefore not possible of knowing what it understood (without possibly asking it what was just said).
The transcription is a decoupled service that runs post-processing on the audio.
The RealTime API and itâs underlying model(s) are in preview
mode. So these kinds of insights would be useful for the developers.
Perhaps the problem is not in the whisper. I had similar problems with determining the language in which audio was recorded. If the request consists of several words, then even English can be recognized as, for example, French. There are no errors with long texts. Alternatively, set the default English language if the number of words is less than 5 and look at the result
yes accuracy is very bad for non english it just say in any language, is there a way we can put language code here ?âinput_audio_transcriptionâ: { âmodelâ: âwhisper-1â }
In my understanding, the whisper-1 transcription is merely an approximation of what was actually heard by the GPT-4 Omni model, which on one hand is good to have in case you want to do some processing on the side, but on the other hand its difficult to correlate one with the other in terms of the behavior of the core realtime model