OpenAI Whisper- Send Bytes (python) instead of filename

Hi,

I hope you’re well. Really enjoying using the OpenAI api, recently had some challenges and was looking for some help.

I don’t want to save audio to disk and delete it with a background task.

My FastAPI application uses a an UploadFile (meaning users upload the file, and I then have access a SpooledTemporaryFile).

Previously using the free version of Whisper on Github, I was able to send the bytes to the model, whereas this API isn’t working this way.

Can anyone else advise on how they are transcribing audio in python without saving videos/audio to disk?

Thanks,

Hi, I am using the openai-whisper from github in my django app and I am sending bytes just like you but it is not working.

Can you please share your code?

Any updates on this issue ? [quote=“virajvaitha1995, post:1, topic:84786, full:true”]
Hi,

I hope you’re well. Really enjoying using the OpenAI api, recently had some challenges and was looking for some help.

I don’t want to save audio to disk and delete it with a background task.

My FastAPI application uses a an UploadFile (meaning users upload the file, and I then have access a SpooledTemporaryFile).

Previously using the free version of Whisper on Github, I was able to send the bytes to the model, whereas this API isn’t working this way.

Can anyone else advise on how they are transcribing audio in python without saving videos/audio to disk?

Thanks,
[/quote]

Hi All,

I also had this problem and managed to find a solution. I was using pydub to load and edit audio segments and then wanted to send a pydub audio segment directly to whisper without having to create a temporary file. The following approach worked: basically create BytesIO buffer, encode the audio into it in a supported format and then pass it to whisper:

import openai
from pydub import AudioSegment

fname = "file.mp3"
audio = AudioSegment.from_file(fname, format="mp3")
# only use first 5sec
audio = audio[:5000]

buffer = io.BytesIO()
# you need to set the name with the extension
buffer.name = fname
audio.export(buffer, format="mp3")

transcript = openai.Audio.transcribe("whisper-1", buffer)
2 Likes