Timestamped Captions for TTS API [Feature Request]

itrydat · December 1, 2023, 11:52pm

I am currently using your Text-to-Speech API for an educational project aimed at providing accessible learning materials.

To enhance accessibility, it’d be great to have access to the timestamped transcript along with the TTS response. This would enable users to see captions that are perfectly synchronized with the speech output.

Here are a few challenges with different approaches I’ve tried:

Browser-based caption generators like “react-speech-recognition” are not sufficiently accurate or synchronized with the TTS output.
Using Whisper for a separate transcription adds complexity, leads to synchronization issues and increases cost.
Displaying the original text would not account for the varying speech rates, making it difficult to follow along.

Thanks!

albirrkarim · January 11, 2025, 12:38pm

I think you need my npm library

React / Vanilla JS Text to Speech with highlighting the words and sentences that are being spoken using audio files, text to speech API, and web speech synthesis API

It can produce timestamp timing for each word in client side (no need to use whisper) with just input text and the audio file that generated from TTS API you can do tts with highlight.

Not just have that capability, it have many poweful and flexible programmatic API that you can just use.

Just checkout my repo or try the demo website

Topic		Replies	Views
Text to Speech Word Timings API tts	4	5019	January 30, 2025
How to get Whisper's API to add timestamps to the transcripts? API api , whisper	5	15262	January 29, 2024
Real-time subtitle feature Forum feedback chatgpt	1	252	December 4, 2024
Whisper API & Word-Level Time-stamping API whisper	6	19120	December 14, 2023
Automatically Generating Subtitles: Is it Possible? API	3	4206	January 30, 2024

Timestamped Captions for TTS API [Feature Request]

Related topics