Is there an API that recognizes from voice data and responds to voice data?

TakashiOta · December 22, 2023, 3:04am

Is there an API that not only sends an audio file to OpenAI Server and generates text, but also recognizes the audio data and responds audio data.

I know speechToText API and TextToSpeech is prepared.
But, if I try to recognize form voice data and responds to voice data, I will connect to OpenAI Server three times.

【Three times】
First：SpeechToText
Second：Request chatText
Third：TextToSpeech

I want to know an API that can perform the above processing in one time.

wclayf · December 22, 2023, 5:37am

This is a great idea and much needed for doing a normal speech conversational flow. They could make it where the upstream and downstream audio channels are kept open long term, so that you can actually interrupt the response speech, and say “stop, you misunderstood” or whatever, to cut them off, just like with a human.

Or maybe not everyone’s rude and cuts people off. haha. But I don’t think this API feature exists yet. Hopefully they are working on it!

Topic		Replies	Views
Text completion and get voice response Feedback	1	61	September 13, 2024
ChatGPT API TTS streaming API api	2	2982	June 1, 2024
Implementing audio conversation with AI API	8	3425	April 29, 2024
Voice to voice via API possible? API gpt-4 , api	1	481	May 27, 2024
Getting audio stream from chat completion API API chatgpt , api , tts	5	3826	December 25, 2023

Is there an API that recognizes from voice data and responds to voice data?

Related topics