Implementing audio conversation with AI

machin1st · November 9, 2023, 12:15pm

Having a audio conversation with AI seems like a game changer, and I want to integrate that into my app. So looking at the API the way to do it currently is to:

record audio
send it to the transcription endpoint
send the transcribed text to the chat endpoint to get AI text response
send the AI text response to the text-to-speech endpoint

I haven’t implemented it yet, but it seems to me be a lot of steps and might cause lag issues. Is this a feasible way to go and are there any plans to extend the api to e.g. do steps 2-4 in one go e.g. a audio dialogue endpoint?

LinqLover · November 9, 2023, 9:23pm

Iirc steps 3 and 4 support streaming which should reduce lags. Not aware of any more consolidated approach though.

spacegodreal · November 9, 2023, 10:14pm

have you had success accessing the text to speech endpoint ? My gpt never gets an mp3 back…

machin1st · November 9, 2023, 11:01pm

Yes this works for me:

    const generateSpeech = async () => {
        try {
            const response = await axios.post(
                "https://api.openai.com/v1/audio/speech",
                {
                    model: "tts-1",
                    input: text,
                    voice: "alloy",
                },
                {
                    headers: {
                        Authorization: `Bearer ${
                            import.meta.env.VITE_OPENAI_API_KEY
                        }`,
                    },
                    responseType: "blob",
                }
            );

            const url = window.URL.createObjectURL(new Blob([response.data]));
            const audio = new Audio(url);
            audio.play();
        } catch (error) {
            console.error(error);
        }
    };

yafimski · February 12, 2024, 7:49pm

Managed to solve this for React Native Expo if anyone is interested.
Posted it as answer to SO post about ‘play-audio-response-from-openai-tts-api-in-react-native-with-expo’ (cannot post links here)

Foxalabs · February 12, 2024, 8:50pm

Drop me a DM with the link and I’ll add it to your post for you

jamilbio20 · February 12, 2024, 10:28pm

I do these exact procedures in my Bash shell API wrapper for OpenAI (GitHub: mountaineerbr/shellChatGPT).

The only thing you missed is playing the received audio file from OpenAI. Try requesting for Opus, which is a more modern format. Also, you can play the audio file while still receiving it! Just beware of the player you will be using, as some of them will pause a little when the buffer is empty until new data arrive, and others will just stop playing when the buffer is empty…

I got a replay command in my shell script, so that if the audio player fails (in Termux, that is the case), then user may replay. If playing with desktop player like cvlc or any other media player, they usually can handle and wait more data coming and don’t just abort.

yafimski · February 13, 2024, 12:40am

would be awesome to do that. can’t find any DM button anywhere on the forum…

LarrysWorkbench · April 29, 2024, 8:12am

That’s exactly what I did and yeah it seems like a lot of steps. The lag is actually better than I expected. It’s about the same as talking to the OpenAI app on iPhone, which is tolerable for my hobby robot project. TTS seems to take the most time - would love to figure out how to stream it -

Topic		Replies	Views
ChatGPT API TTS streaming API api	3	5112	January 21, 2025
Multiple API calls - high latency; options / product suggestion API chatgpt	21	3361	December 25, 2023
Getting audio stream from chat completion API API chatgpt , api , tts	5	4246	December 25, 2023
Streaming text in and audio out? API api , tts	3	4049	June 19, 2024
This is a new blank post im making heres is a blank post im making API gpt-4	11	1125	January 29, 2024

Implementing audio conversation with AI

Related topics