I’m trying to evaluate Whisper for audio transcription and am having some difficulty using it. To test it I recorded a short MP3 file (2m 30s):
terrafrost[dot]com/test.mp3
(I obfuscated the URL some because it’s not letting me include links; I can’t post mp3 files as attachments either apparently but if I want to have a realistic shot of figuring out what’s wrong I don’t see how I can’t)
I then tried to upload it with this code:
$client = new \GuzzleHttp\Client([
'base_uri' => 'https:// api.openai.com/', // added space because links aren't allowed
'verify' => false
]);
$audio = file_get_contents($file);
$file = 'audio.mp3';
$response = $client->request('POST', '/v1/audio/transcriptions', [
'auth' => ['Bearer', 'sk-...'],
'multipart' => [
['name' => 'file', 'contents' => $audio, 'filename' => $file],
['name' => 'model', 'contents' => 'whisper-1'],
['name' => 'language', 'contents' => 'en']
]
]);
$transcript = $response->getBody()->getContents();
$transcript = json_decode($transcript);
When I ran that code, however, I got this error back:
{
"error": {
"message": "Invalid file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']",
"type": "invalid_request_error",
"param": null,
"code": null
}
}
The file absolutely is an MP3. On Windows I tested it with mark0[dot]net/soft-trid-e.html and got this back:
TrID/32 - File Identifier v2.24 - (C) 2003-16 By M.Pontello
Definitions found: 17785
Analyzing...
Collecting data from file: test.mp3
62.5% (.MP3) LAME encoded MP3 audio (ID3 v2.x tag) (5000/1/1)
37.5% (.MP3) MP3 audio (ID3 v2.x tag) (3000/1)
I tried file test
(I remove the *.mp3 extension to make sure it wasn’t guessing the type from that) and got this:
test: Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v1, 192 kbps, 44.1 kHz, Stereo
Am I doing something wrong or does Whisper just not work?