Hello.
Transcription works fine when I use a file as a source. However, I do not want to rely on disk storage when I need to transcribe an audio segment. The following code works fine:
audio = AudioSegment.from_file(audio_path_filename_ext,format="mp3")
segment = audio[start_time:end_time]
segment.export(temp_path_filename_ext)
with open(temp_path_filename_ext,'rb') as audio_file:
transcription = self.__client.audio.transcriptions.create(
model=config.chatgpt_voice_recognition_model,
file=audio_file,
language=config.language,
temperature=0,
)
However, when I attempted to use io.BytesIO() , I encountered an error.
I believe the only way is to attempt to dump the data sent to OpenAI from both pieces of code. Then, analyze the differences and create a custom class to correct the data being sent to OpenAI. When I have free time, I will try to accomplish this.
I got around this issue by setting the .name attribute on the buffer object. It appears that the Whisper API is inferring the file type from the extension on this attribute, rather than inspecting the raw bytes themselves. Here’s a snippet that worked for me (I’m using GraphQL with multipart file uploads).
@strawberry.type
class Mutation:
@strawberry.mutation
async def transcribe(self, audio_file: Upload) -> str:
audio_data = await audio_file.read()
buffer = io.BytesIO(audio_data)
buffer.name = "file.mp3" # this is the important line
transcription = await openai_client.audio.transcriptions.create(
model="whisper-1",
file=buffer,
)
return transcription.text
Looking at the types in the Python SDK, it looks as though as you can pass a bytes object to the file argument, but I haven’t gotten this to work.
Thanks to @ahmed.alsaba for pointing me toward the right post.
Thanks a lot to you guys, would never get the thought to set the name again, before sending it, but this saved me from a massive headache and frustration^^