Seeking Guidance on Text-to-Speech (TTS): Need Help and Advice

moazeldsoky8 · January 22, 2024, 11:35pm

I have an idea to create a Text-to-Speech (TTS) model, but I haven’t worked on such a project before. I’m seeking advice on tutorials, articles, books, and papers to understand how to fine-tune existing models, build a model from scratch, and manage aspects such as training time and data handling. Any guidance is appreciated. Thank you!

Macha · January 23, 2024, 12:21am

Hey there and welcome to the forum!

So, to start, fine-tuning a TTS model like whisper is a doable approach by an individual. Constructing a from-scratch model, where it must be trained on massive amounts of data, is going to be a lot of work, and quite expensive. And by expensive, I mean millions of dollars worth of compute. So realistically speaking, you would be looking at fine-tuning a pre-existing model.

The next question becomes; what are you trying to fine tune for? Essentially, what are you trying to improve or change with the model? While I haven’t personally fine-tuned a TTS model before, I suspect there would be a bit more work involved in pre-processing the data than other fine tune methods. There’s not as much on TTS fine tuning as there is on other models. Also, iirc, OpenAI’s TTS model is fairly new still, and I don’t think they have a fine-tunable model released yet in any way (not to confuse this with Whisper, which is STT).

Perhaps to get started, check out ElevenLabs?

GodTaube · June 13, 2024, 11:19am

Just a quick comment because I’m really frustrated. Elevenlabs’ demo quality is excellent, but their enterprise customer support is a 1/10. You get passed from one person to the next, likely because they are still small and overwhelmed with inquiries. I’m simply trying to get some standard documents to evaluate if we can work with them, but it seems like they aren’t even reading the emails. I hope OpenAI will develop its own fine-tuned TTS solution.

Topic		Replies	Views
GPTs with Custom Actions by Whisper API and TTS Feedback gpts	18	6506	December 4, 2023
All my attempts to improve accuracy and reduce hallucinations have the opposite effect! API whisper , hallucinations	6	1167	October 31, 2024
Looking for a Specialist to Set Up TTS Models for German and English Speech Generation Community whisper	0	258	June 1, 2024
Speech to Text (ASR) Strategy Community whisper , audio , gpt-4o-audio-preview	8	217	March 10, 2025
Troubleshooting OpenAI's Whisper Model: Resolving Incorrect Language Outputs for Maithili with Multilanguage Tokenizer Community whisper	1	115	September 18, 2024

Seeking Guidance on Text-to-Speech (TTS): Need Help and Advice

Related topics