How to perform real-time English-to-Chinese translation using Whisper and GPT-3.5-Turbo?

Garfield1978 · March 21, 2023, 12:01pm

Hello everyone, I have successfully translated an English audio file to Chinese using Whisper and GPT-3.5-Turbo. However, I am unsure how to achieve real-time English-to-Chinese or Chinese-to-English translation when using a microphone. Can anyone advise me on how to accomplish this?

klcogluberk · March 21, 2023, 12:11pm

Something like this came to mind: 1- Store the sound data received by the microphone with PyAudio somewhere 2- Send real-time received data to the model with the web socket get the answer, and use it.

However, recently, the OpenAI APIs have been experiencing latency and connection errors due to the intensity. This can negatively affect your process

LuigiVampa · April 9, 2023, 11:46am

In C# I’ve been using System.Speech.Recognition library to capture the boundaries of someone’s speech.

        private void loadSpeechRecognition()
        {
            // Create an in-process speech recognizer for the en-GB locale.  
            SpeechRecognitionEngine recogniser = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-GB"));
            recogniser.LoadGrammar(new DictationGrammar());

            // Add a handler for the speech recognized event.  
            recogniser.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);

            // Configure input to the speech recognizer.  
            recogniser.SetInputToDefaultAudioDevice();

            // Start asynchronous, continuous speech recognition.  
            recogniser.RecognizeAsync(RecognizeMode.Multiple);
        }

When the event is triggered it records the resulting audio to a wav file.

        public void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
             using (MemoryStream memoryStream = new MemoryStream())
             {
                 e.Result.Audio.WriteToWaveStream(memoryStream);
                 using (FileStream file = new FileStream("file.wav", FileMode.Create, FileAccess.Write))
                 {
                     memoryStream.WriteTo(file);
                 }
                 _ = transcribe();
             }
        }

It then sends the request to the audio api to transcribe it before sending the transcription to the chat api.

    private async Task transcribe()
    {
         HttpClient client = new HttpClient();
         HttpRequestMessage request = new HttpRequestMessage();

         request = new HttpRequestMessage(HttpMethod.Post, "https://api.openai.com/v1/audio/transcriptions");
         request.Headers.Add("Authorization", "Bearer " + api);

         var content = new MultipartFormDataContent();
         content.Add(new StringContent("whisper-1"), "model");
         content.Add(new ByteArrayContent(File.ReadAllBytes(@"E:\Chris\Script\WinForm\DesktopGPT\DesktopGPT\bin\Debug\net6.0-windows\file.wav")), "file", Path.GetFileName("file.wav"));
         request.Content = content;

         HttpResponseMessage response = await client.SendAsync(request);
          string responseBody = await response.Content.ReadAsStringAsync();
          var deserializedResponse = JsonConvert.DeserializeObject<AudioResponse>(responseBody);

          _ = GetChatAsync(deserializedResponse.strText);
    }

If you swapped the transcription api for the translation api that should do roughly what you need. Until the whisper model can take a stream I think actual real-time is off the table but using the speeech recognition library to define the chunks of speech works a lot better than uploading chunks of an arbitrary frequency I find. It’s slow and clunky but it does listen in real time even if you wait a few for it to respond.

LuigiVampa · April 9, 2023, 11:52am

SpeechRecognitionEngine Class (System.Speech.Recognition) | Microsoft Learn

fasc.ed · October 10, 2023, 6:52pm

I have created a curriculum that narrates evolving images. I am now seeking someone to translate English into Spanish while keeping the timing of the Spanish sentences the same as the English sentences so that the narration remains connected to the evolving images.

Fasc.ed@gmail.com
240-460-9000

Topic		Replies	Views
Whisper-1 joint translation and transcription API	6	2761	October 21, 2024
Help Putting Whisper Code Into Python Script API	2	2141	January 29, 2024
ChatGPT API TTS streaming API api	2	3337	June 1, 2024
GPTs with Custom Actions by Whisper API and TTS Feedback gpts	18	6226	December 4, 2023
Whisper Streaming Strategy API chatgpt , whisper , streaming	5	7500	September 2, 2024

How to perform real-time English-to-Chinese translation using Whisper and GPT-3.5-Turbo?

Related topics