Introducing GPT-4o Transcribe Diarize: Now Available in the Audio API

GPT-4o Transcribe Diarize, a transcription model that identifies who’s speaking when, enables transcripts that clearly associate audio segments with individual speakers. This feature produces the new diarized_json response format, providing you with precise speaker labels along with segment start and end timestamps.

What’s included:

  • Automatic Speaker Identification: GPT-4o Transcribe Diarize automatically detects and labels different speakers, simplifying multi-speaker audio transcription.
  • Speaker Reference Clips: Optionally enhance accuracy by providing short (2–10 second) reference audio clips for up to four known speakers
  • API Endpoint: Available through /v1/audio/transcriptions in the Transcription API.

Speaker diarization has been frequently requested by our developer community; this feature represents a meaningful improvement to existing transcription tools.

Check out the documentation and the API reference to get started and explore detailed examples.

Looking forward to seeing how you utilize this feature!

3 Likes

Saw the model earlier in code pushed yesterday - it’s not been put on the models endpoint yet.

It’s available via playground.
I’ll unlist for the time being.