RealTime API Transcription errors

edo444 · October 11, 2024, 8:59am

Hello everyone, first time posting here—I really appreciate everyone’s help!

I’m currently developing a translation app using OpenAI’s real-time API. While the translation functionality works impressively well, I’ve been encountering significant issues with transcription accuracy—but only with the RealTime Service. Previously, I used Whisper and GPT-4, and they worked perfectly. However, with the RealTime Service, sometimes the transcribed text doesn’t correlate at all with the translated output—it is completely unrelated and off-base, which is really strange.

For instance, when processing audio input in a particular language, the translated text comes out correctly and makes sense. However, the corresponding transcription often fails horribly, displaying text that doesn’t match the audio input or the translation in any way.

Has anyone else experienced similar issues with the transcription service? I thought there wouldn’t be any problems since it uses Whisper, but it seems to fail miserably at times. Is there something I might be overlooking in the implementation, or could this be a problem with the API itself?

Any insights, suggestions, or guidance would be greatly appreciated!

Thank you!

maig · October 11, 2024, 5:52pm

It happened to me all the time. The transcription of my input made no sense and their response also made no sense. It would bring up random topics that weren’t part of my input.

anon10827405 · October 11, 2024, 5:54pm

First thing to try and do is save the buffer locally and run it through whisper to see if it’s something to do with the service, or the format of your audio.

I would imagine that if the audio format is different than expected then it would run into nonsensical responses.

maig · October 11, 2024, 6:45pm

Hey, thank you! I downloaded my audio and figured out it was in slow motion, lol. Got that fixed now.

manoranjan.rajguru14 · November 6, 2024, 5:25pm

facing the same issue . the quality of output transcription is very bad. Is there anyway we can pass language in this “input_audio_transcription”: { “model”: ‘whisper-1’ }

phtrungit · November 13, 2024, 3:20am

I’m experiencing the same issue. It seems to work well with English, but for other languages like Thai and Vietnamese, especially with short audio inputs, it doesn’t work well. It doesn’t translate my input correctly, and the output is often nonsensical.

jgeiger2 · December 27, 2024, 4:58pm

I have dysarthria due to having Cerebral Palsy. The public ChatGPT understands me perfectly, but realtime does not. What gives??

m.d · January 9, 2025, 11:06am

I generated a wav file and passed it on to whisper-1. It correctly transcribed it.
Then I generated the base64 encoded data as follows:

enc=base64.b64encode(open(audio_file,“rb”).read()).decode(“utf-8”)

Passed this to Realtime API as follows:
await websocket.send(json.dumps({“type”: “input_audio_buffer.append”, “audio”: enc}))

I base64 decoded this data again and stored in a separate file and then sent that file to whisper-1 again. This works fine too.

But the API fails to understand the audio.

I even tried by setting the transcription model in the session object:

“input_audio_transcription”: {“model”: “whisper-1”}

But no change in the behavior.

Clearly something is not working with the realitime API. I tried both the models- gpt-4o-realtime-preview-2024-10-01 as well as gpt-4o-mini-realtime-preview-2024-12-17, but no luck.

Is it the problem with the API or am I doing something incorrectly?

Topic		Replies	Views
Input_audio_transcription accuracy API realtime	6	458	November 6, 2024
Input_audio_transcription not working in Real-Time — related to g711_ulaw? Bugs realtime	7	1125	December 26, 2024
Languages in Realtime API API realtime	7	2849	January 10, 2025
Whisper spitting out gibberish when trying to transcribe API whisper	4	883	June 14, 2024
[Realtime API] Input audio transcription is not showing Bugs realtime	9	1770	February 28, 2025

RealTime API Transcription errors

Related topics