I’m testing new transcription models (gpt-4o-transcribe
, gpt-4o-mini-transcribe
) for the Latvian language, but every time I use client.audio.transcriptions.create
, it returns random text - usually a single sentence unrelated to the audio. Here’s an example:
The older model (
whisper-1
) provides the correct result:Other LLM models (e.g.,
gemini-2.0-flash
) also return the correct result for the same audio file:.
I have tested gpt-4o-transcribe
with English audio, and it produced the correct result.
I’m using the following code to create a transcript:
from IPython.display import Markdown
from openai import OpenAI
client = OpenAI(api_key='API_KEY')
audio_file = open("D:/Downloads/20250416_1332_24min.mp3", "rb")
# First 24 min from this source: https://med.latvijasradio.lv/saeima/20250416_1332.mp3
transcription = client.audio.transcriptions.create(
model="gpt-4o-transcribe",
file=audio_file,
prompt='transkribē šīs Saeimas debates',
language='lv',
response_format="json"
)
print(transcription.text)
transcription_mini = client.audio.transcriptions.create(
model="gpt-4o-mini-transcribe",
file=audio_file,
prompt='transkribē šīs Saeimas debates',
language='lv',
response_format="json"
)
print(transcription_mini.text)
transcription_lv_wh = client.audio.transcriptions.create(
file=audio_file,
model="whisper-1",
language="lv",
response_format="verbose_json",
timestamp_granularities=["segment"]
)
Markdown(transcription_lv_wh.text)
Any idea why these new transcript models generate random text, even though Latvian is one of the supported languages?
I have Python version: 3.12.4 and openai version: 1.75.0