I think no one expects audio processing to be flawless—hiccups are totally fine.
What can’t happen, though, is losing five minutes of spoken thought forever.
When we use Voice Mode, we’re often “thinking out loud.” The exact phrasing, the flow, and the insights that surface while speaking are hard to recreate. If the transcript fails and the original audio is discarded, that moment is gone.
I’ve hit this problem often enough to warrant a feature request:
-
Keep the raw audio until the user confirms the transcript.
If transcription or network errors occur, simply let us download the file.
-
(Nice-to-have) Store recordings locally in the app.
Even when transcription succeeds, we might want to re-process the audio elsewhere or compare it against the generated text.
This small change would remove the tension of “will it capture everything?” and let us focus on thinking instead of worrying about losing ideas.
Use-case | Practical workaround |
---|---|
Long brainstorming | Record in your phone’s Voice Memos (or any recorder) first, then upload the saved file to ChatGPT. If transcription fails, your recording is safe. |
Staying in the ChatGPT app | Keep each take under ~2 minutes and wait for the partial transcript to appear before continuing. Short clips rarely fail. |
Critical sessions | Run a parallel recorder while you speak. Delete the extra file only after the transcript looks good. |