Transcript--> Translate->Text to Speech

Hello, I’m discovering the world of artificial intelligence and I lack the knowledge to do what I’d like to do. I would like to transcribe, translate and create a new video sound track in my language, French. The work to be done involves dozens of hours of video. I’ve tried a free open source TTS, Marytts, but the result is not up to my expectations. I’d like to talk to someone who can guide me towards the right tools for this project. Even if I have to use a paid tool. I’ve used Whisper in a GColab notebook, I’m thinking of using Deepl for a large part of the translation, but for the TTS I don’t know. What would be the best value for money for such a tool? Do you have a link to a beginner’s tutorial for openai TTS?

https://platform.openai.com/docs/guides/text-to-speech

The cadence of TTS could quickly get out of sync with the utterances and their mismatched closed caption, so it might still take a lot of audio editing and tweaking in your video software even with labeled translated snippets.

I was thinking of using json files and working in sequence.

1 Like

The point being made is that if someone in a video speaks very quickly, and the TTS speaks normally, you will have a time mismatch, real time automated translation is a complex task and will require temporal adjustment to the audio as well as the translation itself, i.e. you may need to measure the length of the original speech and time stretch the TTS translation to match.

1 Like

I had indeed thought of all that. A friend of mine managed to do just that by playing with the pitch/playback speed per segment. The aim would be to process segments of a few seconds each, for example, after a whisper transcription i have this :

00:43.320 → 00:51.960
And I’ve had students come into my fold and question whether or not it was something that could work in live accounts

00:51.960 → 00:57.320
because they themselves couldn’t make it work in either a demo or paper trading.

00:57.320 → 01:04.600
Because either they could not find it in themselves to submit to the process and the time required for their own individual development

1 Like