Timestamped Captions for TTS API [Feature Request]

itrydat · December 1, 2023, 11:52pm

I am currently using your Text-to-Speech API for an educational project aimed at providing accessible learning materials.

To enhance accessibility, it’d be great to have access to the timestamped transcript along with the TTS response. This would enable users to see captions that are perfectly synchronized with the speech output.

Here are a few challenges with different approaches I’ve tried:

Browser-based caption generators like “react-speech-recognition” are not sufficiently accurate or synchronized with the TTS output.
Using Whisper for a separate transcription adds complexity, leads to synchronization issues and increases cost.
Displaying the original text would not account for the varying speech rates, making it difficult to follow along.

Thanks!

albirrkarim · January 11, 2025, 12:38pm

I think you need my npm library

React / Vanilla JS Text to Speech with highlighting the words and sentences that are being spoken using audio files, text to speech API, and web speech synthesis API

It can produce timestamp timing for each word in client side (no need to use whisper) with just input text and the audio file that generated from TTS API you can do tts with highlight.

Not just have that capability, it have many poweful and flexible programmatic API that you can just use.

Just checkout my repo or try the demo website

Topic		Replies	Views
Text to Speech Word Timings API tts	4	5499	January 30, 2025
How to get Whisper's API to add timestamps to the transcripts? API api , whisper	5	17188	January 29, 2024
OpenAI TTS Transcription Time stamps API	1	260	May 10, 2025
[feature-request] Add duration_ms to Realtime API conversation transcript deltas Feedback feature-request , realtime , api-realtime	3	145	June 17, 2025
Discussion around syncing real-time AI-generated transcript deltas with WebRTC audio playback to ensure speech and on-screen text appear in natural alignment. API gpt-4 , chatgpt , api	1	162	May 6, 2025

Timestamped Captions for TTS API [Feature Request]

Related topics