GPT-4o Transcribe Diarize, a transcription model that identifies who’s speaking when, enables transcripts that clearly associate audio segments with individual speakers. This feature produces the new diarized_json
response format, providing you with precise speaker labels along with segment start and end timestamps.
What’s included:
- Automatic Speaker Identification: GPT-4o Transcribe Diarize automatically detects and labels different speakers, simplifying multi-speaker audio transcription.
- Speaker Reference Clips: Optionally enhance accuracy by providing short (2–10 second) reference audio clips for up to four known speakers
- API Endpoint: Available through /v1/audio/transcriptions in the Transcription API.
Speaker diarization has been frequently requested by our developer community; this feature represents a meaningful improvement to existing transcription tools.
Check out the documentation and the API reference to get started and explore detailed examples.
Looking forward to seeing how you utilize this feature!