When attempting to transcribe mp3 with whisper api i get error saying file need to be mp3?

I’m trying to evaluate Whisper for audio transcription and am having some difficulty using it. To test it I recorded a short MP3 file (2m 30s):


(I obfuscated the URL some because it’s not letting me include links; I can’t post mp3 files as attachments either apparently but if I want to have a realistic shot of figuring out what’s wrong I don’t see how I can’t)

I then tried to upload it with this code:

$client = new \GuzzleHttp\Client([
	'base_uri' => 'https:// api.openai.com/', // added space because links aren't allowed
	'verify' => false

$audio = file_get_contents($file);
$file = 'audio.mp3';

$response = $client->request('POST', '/v1/audio/transcriptions', [
	'auth' => ['Bearer', 'sk-...'],
	'multipart' => [
		['name' => 'file', 'contents' => $audio, 'filename' => $file],
		['name' => 'model', 'contents' => 'whisper-1'],
		['name' => 'language', 'contents' => 'en']

$transcript = $response->getBody()->getContents();
$transcript = json_decode($transcript);

When I ran that code, however, I got this error back:

  "error": {
    "message": "Invalid file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']",
    "type": "invalid_request_error",
    "param": null,
    "code": null

The file absolutely is an MP3. On Windows I tested it with mark0[dot]net/soft-trid-e.html and got this back:

TrID/32 - File Identifier v2.24 - (C) 2003-16 By M.Pontello
Definitions found:  17785

Collecting data from file: test.mp3
 62.5% (.MP3) LAME encoded MP3 audio (ID3 v2.x tag) (5000/1/1)
 37.5% (.MP3) MP3 audio (ID3 v2.x tag) (3000/1)

I tried file test (I remove the *.mp3 extension to make sure it wasn’t guessing the type from that) and got this:

test: Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v1, 192 kbps, 44.1 kHz, Stereo

Am I doing something wrong or does Whisper just not work?

You might have to strip id3 tags on the file and ensure that it starts with a valid MP3 frame.

MP3 is is a streaming format, and a stream parser should be able to resume data mid-flight, ignoring invalid data. When in file form, garbage at the start of the file often isn’t handled so gracefully, especially if there is a naive parser trying to validate input.