Hello everyone, first time posting here—I really appreciate everyone’s help!
I’m currently developing a translation app using OpenAI’s real-time API. While the translation functionality works impressively well, I’ve been encountering significant issues with transcription accuracy—but only with the RealTime Service. Previously, I used Whisper and GPT-4, and they worked perfectly. However, with the RealTime Service, sometimes the transcribed text doesn’t correlate at all with the translated output—it is completely unrelated and off-base, which is really strange.
For instance, when processing audio input in a particular language, the translated text comes out correctly and makes sense. However, the corresponding transcription often fails horribly, displaying text that doesn’t match the audio input or the translation in any way.
Has anyone else experienced similar issues with the transcription service? I thought there wouldn’t be any problems since it uses Whisper, but it seems to fail miserably at times. Is there something I might be overlooking in the implementation, or could this be a problem with the API itself?
Any insights, suggestions, or guidance would be greatly appreciated!
Thank you!