My main reason to not to use whisper is actually that is not made for real time interaction (streaming). Right now (lets see in the near future with the voice input in GPT-o) you should check other STT services. For me the best one (having in mind reliability, languages,speed and cost) is Amazon Transcribe. I tried already lots of them. Actually, the answer from GPT-o is really fast. About last step TTS generation, I also tried everything available in the market, and my current choice is Google Cloud for TTS, is pretty fast, not expensive and kind of ok with the quality for my case. Right now I achieve a kind of natural feeling of conversation regarding timing.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How does ElevenLabs or Deepgram realtime voice agents work as good as OpenAI Realtime API? | 3 | 1148 | February 26, 2025 | |
How does the 'Call Annie' app achieve such remarkable speed with the ChatGPT API, and is it using stream mode? | 8 | 3691 | September 24, 2024 | |
How to reduce latency with GPT & Unity Requests | 2 | 473 | July 3, 2024 | |
Implementing audio conversation with AI | 8 | 4037 | April 29, 2024 | |
ChatGPT API Very Slow at generating Responses | 8 | 5318 | December 25, 2023 |