I’m still learning about programming, so I apologize if I’m missing something obvious, but I wanted to bring up two issues I’ve encountered with the GPT-4 Voice Generation API (gpt4o-audio-preview/realtime audio):
The main concern is:
- We’re seeing unexpected generation of silent audio that we haven’t requested, which is consuming credits (some users reported $10 charges). If possible, could there be a way to detect and stop processing when extended silence occurs?
Also:
2. When requests are rejected, we get “I’m sorry” voice responses. Would it be possible to have simpler rejection notifications (like error codes)?
If these are issues that can be handled on our end through proper implementation, I’d really appreciate any guidance. However, if these are API-side matters, would you consider looking into potential improvements?
Thanks for your time and consideration!