Having a audio conversation with AI seems like a game changer, and I want to integrate that into my app. So looking at the API the way to do it currently is to:
- record audio
- send it to the transcription endpoint
- send the transcribed text to the chat endpoint to get AI text response
- send the AI text response to the text-to-speech endpoint
I haven’t implemented it yet, but it seems to me be a lot of steps and might cause lag issues. Is this a feasible way to go and are there any plans to extend the api to e.g. do steps 2-4 in one go e.g. a audio dialogue endpoint?
1 Like
Iirc steps 3 and 4 support streaming which should reduce lags. Not aware of any more consolidated approach though.
2 Likes
have you had success accessing the text to speech endpoint ? My gpt never gets an mp3 back…
1 Like
Yes this works for me:
const generateSpeech = async () => {
try {
const response = await axios.post(
"https://api.openai.com/v1/audio/speech",
{
model: "tts-1",
input: text,
voice: "alloy",
},
{
headers: {
Authorization: `Bearer ${
import.meta.env.VITE_OPENAI_API_KEY
}`,
},
responseType: "blob",
}
);
const url = window.URL.createObjectURL(new Blob([response.data]));
const audio = new Audio(url);
audio.play();
} catch (error) {
console.error(error);
}
};