I also had this problem and managed to find a solution. I was using pydub to load and edit audio segments and then wanted to send a pydub audio segment directly to whisper without having to create a temporary file. The following approach worked: basically create BytesIO buffer, encode the audio into it in a supported format and then pass it to whisper:
import openai
from pydub import AudioSegment
fname = "file.mp3"
audio = AudioSegment.from_file(fname, format="mp3")
# only use first 5sec
audio = audio[:5000]
buffer = io.BytesIO()
# you need to set the name with the extension
buffer.name = fname
audio.export(buffer, format="mp3")
transcript = openai.Audio.transcribe("whisper-1", buffer)
The solution from @jayseaeff worked for me, even without using pydub. The important part is to set the .name attribute on the buffer object. It appears that the Whisper API is inferring the file type from the extension on this attribute, rather than inspecting the raw bytes themselves. Here’s a snippet that worked for me (I’m using GraphQL with multipart file uploads).
@strawberry.type
class Mutation:
@strawberry.mutation
async def transcribe(self, audio_file: Upload) -> str:
audio_data = await audio_file.read()
buffer = io.BytesIO(audio_data)
buffer.name = "file.mp3" # this is the important line
transcription = await openai_client.audio.transcriptions.create(
model="whisper-1",
file=buffer,
)
return transcription.text
Looking at the types in the Python SDK, it looks as though as you can pass a bytes object to the file argument, but I haven’t gotten this to work.
Yeah in order to send bytes of a file you need to send more than just the buffer to the file parameter. You can see what types the file property accepts by stepping into the documentation with VS code or Intellij. Here’s an example of what is working for me: