Translation to a chosen language in real-time during a video conference

mazen.obeid · February 21, 2024, 3:51am

Hi All,

I am currently working on a project focused on video conferencing. One aspect of the project involves accommodating users who do not understand English by allowing them to listen in real-time to the conference in their chosen language.

How can this be achieved? Which model should I use?

If there isn’t an existing model for this purpose, could I generate text (Speech-to-Text) and then pass this text to a model to translate it directly into the selected language in real-time?

sps · February 21, 2024, 4:16am

Welcome to the community, @mazen.obeid!

Currently, the transcription model on the API does not support streaming audio responses, which may result in significant lag. There is also a translation endpoint that currently on translates to English, but may support other languages in the future.

For now, you will need to deploy your own instance of Whisper locally for streaming STT.

Afterward, you will need to implement streaming translations.

EDIT: META has already conducted research on this topic and have functioning models - Seamless communication.

jwatte · February 21, 2024, 5:19am

The OpenAI whisper model does not have a real-time mode that’s publicly available.
I have done something similar using other services on the web, and my current choice is AssemblyAI, but there are several others, like Deepgram, Rev, AWS Transcribe, and so on. (Plugging one of these names into Google will typically return paid ads for all the others, because they bid on each others traffic. And Google caches the check :-D)

Topic		Replies	Views
How to perform real-time English-to-Chinese translation using Whisper and GPT-3.5-Turbo? API whisper	4	5027	October 10, 2023
Whisper - What would be the approach to transcribing multi-language audio? API whisper	3	3357	December 17, 2023
Whisper-1 joint translation and transcription API	6	3259	October 21, 2024
Can I replace OpenAI's Whisper transcription in real-time WebRTC chat with a custom transcription function? API realtime	0	75	April 17, 2025
Can I use Openai Realtime API for Speech-to-Text? API realtime	5	1722	January 30, 2025

Translation to a chosen language in real-time during a video conference

Related topics