Realtime API issues - good practices

etiennemaugars · November 25, 2024, 8:26pm

Hello,

I’ve used the RealtimeAPI interface and tweaked the parameters quite a bit, but I still have some doubts as to whether it’s a finished, 100% usable product.
First of all, I feel that the instructions can’t be too long (750 characters seems to be the maximum), otherwise the model gets lost and confused.
There’s always a voice cut-off. I think the model is listening to itself speak and mistakenly thinks it’s the user. The result is an endless loop of interruptions.
Also, the speech transcription is not really good, and sometimes even detects French or Korean when I’m speaking in English. I don’t see any settings to detect only a specific language (but the question has been asked on this forum).

Have you had similar problems, and if so, have you found a way to solve them?
Do you have any recommendations?

Thank you !

anon10827405 · November 25, 2024, 9:23pm

For voice cutting off this seems to be a common issue. Even the official advanced voice mode does this.

You should be able to inspect the logs to determine if it’s cutting off the voice as a result of hearing itself. There’s also methods available to prevent this from happening.

In terms of transcription issues: yes, this is a common issue with whisper, which attempts to initially classify the audio before performing transcription/translation. You may be better off handling the transcription yourself as I believe you are charged for it.

For instructions: ideally the RealTime API model should be kept as lightweight as possible. This is inherent in any pioneering model. It’s all about micromanaging until abundance becomes available.

It is in a preview stage, not production. Great for a PoC but don’t let anyone tell you that it’s fully functional in production

j0rdan · November 26, 2024, 11:21am

Excellent overview, and pretty much summarizes all my important takeaways using the Realtime API so far

vb · January 3, 2025, 7:00am

This topic was automatically closed after 21 hours. New replies are no longer allowed.

Topic		Replies	Views
Challenges in Multilingual Understanding with Realtime APIs Feedback bug , realtime , api-realtime	3	1003	October 25, 2024
RealTime API Transcription errors Bugs realtime	7	1775	January 9, 2025
RealtimeAPI audio feedback Feedback gpt-4	9	649	January 30, 2025
Sharing experiences about Realtime in the backend ☕ API java , realtime	0	1053	January 29, 2025
Issues with GPT-4o-transcribe API API realtime	9	1650	April 25, 2025

Realtime API issues - good practices

Related topics