Gpt-4o-transcribe-diarize

I have a use case where I must transcribe and diarize short audio files, maybe 1 minute or 2 at most, and often the speakers are mixing Spanish and English.

The issue is that I need the transcription verbatim, but the gpt-4o-transcribe-diarize model seems to slip in and out of translation mode, some of the Spanish being translated to English.

The language body property doesn’t seem like it’s appropriate for these mixed language use cases.

Does anyone know of a strategy or a combination of request parameters that might aid me in getting verbatim transcriptions?

Thanks in advance!

This seems like an excellent use-case for prompting the model. Instead of whisper, where a prompt is merely for better continuation of previously produced text, gpt-4o is a model that can consider instructions passed there also.

The ‘prompt’ field is not supported when using gpt-4o-transcribe-diarize, however.

One technique that may be a possibility when you are using this endpoint and already must furnish speaker names and audio samples to classify them:

  • "known_speaker_names": ["Joe in English", "José en Español"]
1 Like

Interesting, thanks for the idea. I’ll give the speaker name/ language context combination a try.