It would be nice if the TTS API allowed more than 4096 characters of input.
I have some long-form text that I’d like to turn into audio, and OpenAI’s text-to-speech is the best tool that I could find. Unfortunately, the hard limit on 4096 characters make it a non-starter.
I know that Google has an asynchronous API for long-form audio TTS, would it be possible to get something similar for OpenAIs TTS-model (which I much prefer to Googles)?
I just ran into this as well. I’m sure they will bump up the limit at some point in the future as this model gets scaled up. I’m assuming for now I’m going to have to stitch together audio files which sucks but probably just temporary.
I ran into this a while ago and made a little Python library/CLI tool to automate the chunking/stitching, hopefully some others will find it useful, you can find it on PyPI/pip: tts-joinery.
Also, the pronunciation accent quality may vary for long texts. For example, using Brazilian Portuguese for long texts turns into an accent from Portugal. So, chunking the content into smaller parts is the way to avoid character limitation and accent changing.