Streaming from Text-to-Speech api

greg.bendash · March 2, 2024, 10:07am

Sure, so there are two parts:

Server side is a REST Api with Node.js and Express, something like this:

app.get("/api/stream", async (req, res) => {
  const { text, voice } = req.query; // Assuming the text for TTS is passed as a query parameter
generateOpenAIAudio(text, voice, req, res);
});

And actually, I’ve just checked and currently I’m using these parameters for the
generateOpenAIAudio audio generation function:

...
const response = await openai.audio.speech.create({
    model: "tts-1",
    voice: voice,
    input: text,
    format: "mp3",
    speed: 1.1,
  });

...

res.writeHead(200, {
    "Content-Type": "audio/mpeg",
  });

Client side is a Vue.js client app, where I have something like this inside a component:

template part:
<audio ref="audioPlayer" crossorigin="anonymous"></audio>

script part:

startAudioStream(text, voice) {
      const streamUrl = `http://localhost:3000/api/stream?voice=${voice}&text=${encodeURIComponent(
        text
      )}`;
      this.playAudioStream(streamUrl);
    },

playAudioStream(streamUrl) {

      // Reference the audio player element.
      const audio = this.$refs.audioPlayer;
      audio.src = streamUrl;

      if (!this.audioContext) {
        // Initialize the AudioContext only once
        this.audioContext = new (window.AudioContext ||
          window.webkitAudioContext)();

        // Create the MediaElementSource node only once
        this.source = this.audioContext.createMediaElementSource(audio);
      }

      // Listen for the 'play' event to play the audio
      // This ensures that the audio is likely to play through without interruption
      audio
        .play()
        .then(() => {
          console.log("Audio playing...");
        })
        .catch((err) => {
          console.error("Error playing audio:", err);
        });

      // _You can also add an 'ended' event listener to do something once the playing has ended
      audio.onended = () => {
        console.log("Audio ended.");
        ...
      };
    },

I hope it helps.

Topic		Replies	Views
How to decrease the latency of Text-To-Speech API? API gpt-4 , api	6	4701	April 26, 2024
Streaming Responses - Exploring Cost-Efficient Alternatives to SSE with AWS Lambda & API Gateway API api	17	17969	February 29, 2024
Realtime API extremely expensive Feedback realtime	66	7820	December 4, 2024
Python integration of real time? API	13	3938	October 5, 2024
Chat completions audio output but not base64 encoded string API chat-completion , speech	5	92	October 10, 2025

Streaming from Text-to-Speech api

Related topics