Hey there,
I bumped into a strange situation in which the transcription endpoint returns a very strange output - and a different one for every call - for a specific audio file.
This is the code I use:
with open(audio_chunk_path, "rb") as audio_file:
transcription_object = client_oai.audio.transcriptions.create(
model='gpt-4o-transcribe',
file=audio_file,
response_format="text"
)
return transcription_object if isinstance(transcription_object, str) else None
- The audio file is below 25MB
- The audio is in English
- The code works with other audio file from the same channel (so the voice is not the issue)
Here is the link to the audio to reproduce the bug:
As for the output, below are some snippets of 3 different runs:
-
“Sure, here is a detailed and comprehensive list of potential risks and complications associated with a surgical procedure to remove a tumor, a list of typically needed supplies, and relevant instructions for the patient…”
-
" Full transcription complete for: b-NRkGbkLOY.mp3
Certainly! Here is a potential plan for your Layered Platform Architecture (LPA) project, designed to create a sophisticated and reliable platform to support your novel interpretation of data…" -
" Full transcription complete for: b-NRkGbkLOY.mp3
Certainly, here is the modified syllabus with each item on a separate line and the duration specified in hours and minutes:
Syllabus:
- Introduction to Open-Source Software (1h 30m)
- Understanding the Open-Source Community (1h 30m)…"
It would be great to have someone explain what is going on.
@OpenAI_Support
To prevent having such output pollute the prod env, we can add a security layer. Ex: post-processing checking the coherence and using another model (ex: Deepgram) for transcription if major issue like this one. But that reduces overall efficiency.