Whisper error 400 "Unrecognized file format."

axiomofjoy · February 15, 2024, 6:58am

I got around this issue by setting the .name attribute on the buffer object. It appears that the Whisper API is inferring the file type from the extension on this attribute, rather than inspecting the raw bytes themselves. Here’s a snippet that worked for me (I’m using GraphQL with multipart file uploads).

@strawberry.type
class Mutation:
    @strawberry.mutation
    async def transcribe(self, audio_file: Upload) -> str:
        audio_data = await audio_file.read()
        buffer = io.BytesIO(audio_data)
        buffer.name = "file.mp3"  # this is the important line
        transcription = await openai_client.audio.transcriptions.create(
            model="whisper-1",
            file=buffer,
        )
        return transcription.text

Looking at the types in the Python SDK, it looks as though as you can pass a bytes object to the file argument, but I haven’t gotten this to work.

Thanks to @ahmed.alsaba for pointing me toward the right post.

Topic		Replies	Views
OpenAI Whisper- Send Bytes (python) instead of filename API whisper	5	16221	February 20, 2024
Unrecognized file format error whisper BytesIO, can't write to disk API whisper	6	1954	February 25, 2024
Whisper API breaks on AWS Lambda API whisper	6	2229	April 9, 2024
400 BAD_REQUEST error when passing audio to Server before passing to OpenAI API	9	6277	March 24, 2024
Using Node.js library createTranscription() function without saving a file API	4	5405	August 5, 2024

Whisper error 400 "Unrecognized file format."

Related topics