OpenAI Whisper- Send Bytes (python) instead of filename

Hi,

I hope you’re well. Really enjoying using the OpenAI api, recently had some challenges and was looking for some help.

I don’t want to save audio to disk and delete it with a background task.

My FastAPI application uses a an UploadFile (meaning users upload the file, and I then have access a SpooledTemporaryFile).

Previously using the free version of Whisper on Github, I was able to send the bytes to the model, whereas this API isn’t working this way.

Can anyone else advise on how they are transcribing audio in python without saving videos/audio to disk?

Thanks,

Hi, I am using the openai-whisper from github in my django app and I am sending bytes just like you but it is not working.

Can you please share your code?

1 Like

Any updates on this issue ? [quote=“virajvaitha1995, post:1, topic:84786, full:true”]
Hi,

I hope you’re well. Really enjoying using the OpenAI api, recently had some challenges and was looking for some help.

I don’t want to save audio to disk and delete it with a background task.

My FastAPI application uses a an UploadFile (meaning users upload the file, and I then have access a SpooledTemporaryFile).

Previously using the free version of Whisper on Github, I was able to send the bytes to the model, whereas this API isn’t working this way.

Can anyone else advise on how they are transcribing audio in python without saving videos/audio to disk?

Thanks,
[/quote]

1 Like

Hi All,

I also had this problem and managed to find a solution. I was using pydub to load and edit audio segments and then wanted to send a pydub audio segment directly to whisper without having to create a temporary file. The following approach worked: basically create BytesIO buffer, encode the audio into it in a supported format and then pass it to whisper:

import openai
from pydub import AudioSegment

fname = "file.mp3"
audio = AudioSegment.from_file(fname, format="mp3")
# only use first 5sec
audio = audio[:5000]

buffer = io.BytesIO()
# you need to set the name with the extension
buffer.name = fname
audio.export(buffer, format="mp3")

transcript = openai.Audio.transcribe("whisper-1", buffer)
9 Likes

The solution from @jayseaeff worked for me, even without using pydub. The important part is to set the .name attribute on the buffer object. It appears that the Whisper API is inferring the file type from the extension on this attribute, rather than inspecting the raw bytes themselves. Here’s a snippet that worked for me (I’m using GraphQL with multipart file uploads).

@strawberry.type
class Mutation:
    @strawberry.mutation
    async def transcribe(self, audio_file: Upload) -> str:
        audio_data = await audio_file.read()
        buffer = io.BytesIO(audio_data)
        buffer.name = "file.mp3"  # this is the important line
        transcription = await openai_client.audio.transcriptions.create(
            model="whisper-1",
            file=buffer,
        )
        return transcription.text

Looking at the types in the Python SDK, it looks as though as you can pass a bytes object to the file argument, but I haven’t gotten this to work.

Yeah in order to send bytes of a file you need to send more than just the buffer to the file parameter. You can see what types the file property accepts by stepping into the documentation with VS code or Intellij. Here’s an example of what is working for me:

return whisper_api_client.audio.transcriptions.create(
        model="whisper-1",
        file=("temp." + file_type, file_bytes, content_type),
    ).text

where file_type="m4a" and content_type="audio/m4a"

1 Like