Realtime API is extremely fast

acalatrava · October 7, 2024, 9:01am

I have been testing the Realtime API over the past few days and observed that the token generation is excessively fast, particularly in audio mode. The issue arises because the “response.cancel” event seems redundant, as all tokens are generated and delivered before the audio is fully processed. While I understand that the multimodal nature of the Realtime API may justify this behavior in text mode, it creates challenges in audio mode, where tokens should be generated at a pace aligned with the actual playback time of the audio.

This fast token generation results in a problem: when a user interrupts the model mid-sentence, the remaining tokens have already been delivered and billed, which is particularly concerning given the high API cost.

I attempted to limit the output tokens and request additional responses when the output limit is reached, but this approach led to undesirable behavior. Furthermore, it is not currently possible to request only audio tokens; the API forces a request for either text+audio or just text tokens.

I understand this API is still in beta, but I would suggest the following improvements:

Enable the option to set the modality to audio only.
Adjust the generation rate of audio tokens to match the actual audio duration.

Thank you for your consideration.

Topic		Replies	Views
High Cost Due to Silent response.audio.delta Segments in Real-Time API Bugs realtime	4	288	January 7, 2025
Assistant API - Speed and Token Limit API gpt-4 , assistants-api	2	467	May 1, 2024
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	4	2998	December 24, 2023
Api response time too long API	2	3654	November 2, 2023
Realtime API updates — WebRTC, cheaper prices, 4o-mini, and more Announcements	26	8241	December 29, 2024

Realtime API is extremely fast

Related topics