OpenAI recently released their own text-to-speech API, allowing you to voice over any text that you have.
This got me thinking, could I use the tts-bot to automatically generate subtitles, on top of the voice over?
It’s not entirely far-fetched, since subtitle files only consist of two things:
- time at which the text is spoken
Obviously, we already have the text, since that is what we’re feeding to the tts-bot. So, the only missing component is number 2. We just need to know at which time each word is being spoken in the voice over.
That begs the question: is this information available somewhere in the code for OpenAI’s tts-bot? I wonder if this data is available somewhere in the tts-method. From a brief search online, I could not find anything relevant.
Nonetheless, I think it is likely this information exists somewhere in the bot’s code, and is parseable for us to use.