Transcript--> Translate->Text to Speech

SatoBlock · November 10, 2023, 11:10am

Hello, I’m discovering the world of artificial intelligence and I lack the knowledge to do what I’d like to do. I would like to transcribe, translate and create a new video sound track in my language, French. The work to be done involves dozens of hours of video. I’ve tried a free open source TTS, Marytts, but the result is not up to my expectations. I’d like to talk to someone who can guide me towards the right tools for this project. Even if I have to use a paid tool. I’ve used Whisper in a GColab notebook, I’m thinking of using Deepl for a large part of the translation, but for the TTS I don’t know. What would be the best value for money for such a tool? Do you have a link to a beginner’s tutorial for openai TTS?

_j · November 10, 2023, 11:13am

https://platform.openai.com/docs/guides/text-to-speech

The cadence of TTS could quickly get out of sync with the utterances and their mismatched closed caption, so it might still take a lot of audio editing and tweaking in your video software even with labeled translated snippets.

SatoBlock · November 10, 2023, 11:15am

I was thinking of using json files and working in sequence.

Foxalabs · November 10, 2023, 11:37am

The point being made is that if someone in a video speaks very quickly, and the TTS speaks normally, you will have a time mismatch, real time automated translation is a complex task and will require temporal adjustment to the audio as well as the translation itself, i.e. you may need to measure the length of the original speech and time stretch the TTS translation to match.

SatoBlock · November 10, 2023, 12:17pm

I had indeed thought of all that. A friend of mine managed to do just that by playing with the pitch/playback speed per segment. The aim would be to process segments of a few seconds each, for example, after a whisper transcription i have this :

00:43.320 → 00:51.960
And I’ve had students come into my fold and question whether or not it was something that could work in live accounts

00:51.960 → 00:57.320
because they themselves couldn’t make it work in either a demo or paper trading.

00:57.320 → 01:04.600
Because either they could not find it in themselves to submit to the process and the time required for their own individual development

Topic		Replies	Views
How to perform real-time English-to-Chinese translation using Whisper and GPT-3.5-Turbo? API whisper	4	5106	October 10, 2023
Whisper-1 joint translation and transcription API	6	3398	October 21, 2024
GPTs with Custom Actions by Whisper API and TTS Feedback gpts	18	6716	December 4, 2023
All my attempts to improve accuracy and reduce hallucinations have the opposite effect! API whisper , hallucinations	6	1831	October 31, 2024
Text-to-video generation using TTS for audio and a 3D avatar API	3	1048	February 21, 2025

Transcript--> Translate->Text to Speech

Related topics