Gpt-4o-transcribe-diarize

pmar · January 16, 2026, 3:25pm

I have a use case where I must transcribe and diarize short audio files, maybe 1 minute or 2 at most, and often the speakers are mixing Spanish and English.

The issue is that I need the transcription verbatim, but the gpt-4o-transcribe-diarize model seems to slip in and out of translation mode, some of the Spanish being translated to English.

The language body property doesn’t seem like it’s appropriate for these mixed language use cases.

Does anyone know of a strategy or a combination of request parameters that might aid me in getting verbatim transcriptions?

Thanks in advance!

_j · January 16, 2026, 9:17pm

This seems like an excellent use-case for prompting the model. Instead of whisper, where a prompt is merely for better continuation of previously produced text, gpt-4o is a model that can consider instructions passed there also.

The ‘prompt’ field is not supported when using gpt-4o-transcribe-diarize, however.

One technique that may be a possibility when you are using this endpoint and already must furnish speaker names and audio samples to classify them:

"known_speaker_names": ["Joe in English", "José en Español"]

pmar · January 26, 2026, 9:09am

Interesting, thanks for the idea. I’ll give the speaker name/ language context combination a try.

Topic		Replies	Views
Gpt-4o-transcribe outputs content from prompt instruction for small/silent audio samples API transcribe , realtime , gpt-4o-transcribe	0	86	November 24, 2025
Whisper-1 joint translation and transcription API	6	3802	October 21, 2024
Whisper - What would be the approach to transcribing multi-language audio? API whisper	3	3734	December 17, 2023
Introducing GPT-4o Transcribe Diarize: Now Available in the Audio API Community announcement , api , audio , transcribe , diarize	7	4085	November 14, 2025
GPT-4o-transcribe language enforcement API transcribe , gpt-4o-transcribe	3	578	September 9, 2025

Gpt-4o-transcribe-diarize

Related topics