About gpt4o audio (wasteful credit consumption)

I’m still learning about programming, so I apologize if I’m missing something obvious, but I wanted to bring up two issues I’ve encountered with the GPT-4 Voice Generation API (gpt4o-audio-preview/realtime audio):

The main concern is:

  1. We’re seeing unexpected generation of silent audio that we haven’t requested, which is consuming credits (some users reported $10 charges). If possible, could there be a way to detect and stop processing when extended silence occurs?

Also:
2. When requests are rejected, we get “I’m sorry” voice responses. Would it be possible to have simpler rejection notifications (like error codes)?

If these are issues that can be handled on our end through proper implementation, I’d really appreciate any guidance. However, if these are API-side matters, would you consider looking into potential improvements?

Thanks for your time and consideration!

I have ChatGPT-4 Plus, and the same thing happens to me. Sometimes, it’s even hard to disconnect it. It doesn’t happen often, but it does occur. When I connect it to the car, the screen shows I’m on a phone call with my mom :woman_shrugging:.

Honestly, it wouldn’t surprise me if my mom were actually ChatGPT—she’s a machine of a person, incredible and unbeatable :stuck_out_tongue_winking_eye:.

But yes, strange things do happen with the audio; I could write a long text about it. The previous model, even though it can’t be interrupted, is much better. I don’t even know how to activate it on purpose.

Anyway, a lot of things happen with the audio. I’m with you on this :+1:.

1 Like