ChatGPT API TTS streaming

I am developing an iPhone app that can converse in real time using the ChatGPT API.

  1. Transcribe audio to text using Whisper.
  2. Send the transcription hands-free to the ChatGPT API.
  3. Stream ChatGPT’s responses in real time on the chat interface as text.
  4. Once the response is complete, use Text to Speech to vocalize the text.

I have managed to implement up to step 3, but there is a noticeable lag between the completion of step 3 and the start of step 4 when conversing hands-free. I saw on the OpenAI site that streaming real-time audio is possible. I would appreciate it if someone who has experience with this could share their insights.

I’m working on a similar project and was wondering if you managed to resolve the issue with the noticeable lag between steps 3 and 4.

If you were able to solve it, I would greatly appreciate it if you could help me with my project as well. I would be happy to discuss the details and terms of collaboration.

It’s great to see this. I had a similar idea, but I am still researching the tech stack. I found out that many platforms have some sort of text-to-speech API for accessibility, like speechSynthesis in the Web API, but the quality is worse.

I am also curious if there is any way for us to call a sequence of OpenAI APIs, but it seems like there isn’t. I guess the closest we can get is to have your server deployed on Azure.

Real-time apps are super sensitive to lags, so we should find a way to manage that properly. If you have any good solutions, please keep us updated.

