I’m facing an issue with whisper when trying to transcribe audio. I understand that whisper doesn’t do great with .mp3
files but even If I convert the audio file a .wav
file, I get a good transcription on the first run, but after that, whisper just spits out gibberish.
That hasn’t been my experience. I’m having major headache, however, with whisper’s auto-punctuation, despite my post-processing. I think you’ll need to give more details, such as what software you’re using to record the audio, and what platform you’re on, etc.
Please note that I’m just an average user, not a developer, by any means.
Most of my responses here will be “typed” with whisper.
I’m also getting complete gibberish from my audio files. Don’t know what the issue is. I have tried changing file formats but just consistently get gibberish.
If an audio file is accepted but somehow interpreted as “silence” or “statics/noise” by Whisper, then you’ll get gibberish, as AI hallucinates without exception in such instances.
I’d speculate if you gave more specifics, such as your platform, type of recorder used, encoding, methods of verification, you’d be more likely to get useful input from others. I am no “dev” myself.
If you are on macOS, you could try Whispering ( GitHub - braden-w/whispering). It doesn’t prove anything about your audio file. But it will demonstrate to you that an audio snippet of “silence/statics” leads to gibberish.
On Windows, you could try cURL, which is more or less baked into Windows 10/11:
curl https://api.openai.com/v1/audio/transcriptions -H "Authorization: Bearer your-API-key" -H "Content-Type: multipart/form-data" -F model="whisper-1" -F response_format="text" -F file="@C:/Users/username/Desktop/WhisperAudioTest.m4a" | clip
Open CMD and run above - all in one line. If you are behind a proxy server then you’d have to add appropriate flags to “curl”.
If you get gibberish in clipboard (press Ctrl-V in a text field), then you can assume there is something “wrong” (or incompatible rather) with respect to your “WhisperAudioTest.xxx” audio file.
So we have made an app that tracks a play session with a therapist and a child - they needed to track how many times they’ve said selected words.
It’s working fairly well at the moment, basically it just transcribes in realtime and highlights the selected words.
However something odd happens sometimes, it sort of “goes crazy” and adds on a ton of gibberish on to the end of the transcription.
Any ideas?