Input_audio_transcription accuracy

mehdizowghi · October 28, 2024, 3:51pm

Any of you experienced wrong transcription when input_audio_transcription when model whisper-1?
my clients complaint about the realtime is not understanding their meaning and sometimes giving them wrong answer . when i checked the logs, the reason was wrong transcription from whisper-1.

anon10827405 · October 28, 2024, 3:53pm

What language are you speaking

mehdizowghi · October 28, 2024, 3:59pm

English,
Do you think it’s because of not being native speakers?

anon10827405 · October 28, 2024, 4:03pm

It could be. I don’t know.

One thing to note is that the transcription is not what the RealTime API model actually heard. It is voice-to-voice and therefore not possible of knowing what it understood (without possibly asking it what was just said).

The transcription is a decoupled service that runs post-processing on the audio.

The RealTime API and it’s underlying model(s) are in preview mode. So these kinds of insights would be useful for the developers.

zalke.wg · October 30, 2024, 8:21am

Perhaps the problem is not in the whisper. I had similar problems with determining the language in which audio was recorded. If the request consists of several words, then even English can be recognized as, for example, French. There are no errors with long texts. Alternatively, set the default English language if the number of words is less than 5 and look at the result

manoranjan.rajguru14 · November 6, 2024, 5:23pm

yes accuracy is very bad for non english it just say in any language, is there a way we can put language code here ?“input_audio_transcription”: { “model”: ‘whisper-1’ }

ivan-luchkin-u · November 6, 2024, 8:09pm

In my understanding, the whisper-1 transcription is merely an approximation of what was actually heard by the GPT-4 Omni model, which on one hand is good to have in case you want to do some processing on the side, but on the other hand its difficult to correlate one with the other in terms of the behavior of the core realtime model

Topic		Replies	Views
RealTime API Transcription errors Bugs realtime	7	1205	January 9, 2025
Transcription Accuracy on different language API realtime	3	253	November 7, 2024
Languages in Realtime API API realtime	7	2853	January 10, 2025
Whisper api produces transcription in korean on no speech API whisper	3	1214	November 24, 2024
Whisper-1 joint translation and transcription API	6	3086	October 21, 2024

Input_audio_transcription accuracy

Related topics