Playing audio in JS sent from realtime API

aistuff · October 7, 2024, 10:48pm

Is there any documentation on how to actually play (in JS) the audio deltas sent from Realtime API?

I’m sending each audio delta to my frontend, then for each audio delta I’m converting to ArrayBuffer, and I’m trying decode with audioContext.decodeAudioData(arrayBuffer), but I’m getting an error saying it can’t decode.

Thanks

omedkane4 · October 7, 2024, 11:28pm

Same, I created a buffer out of each and then I concatened them, then I saved the file but it just won’t read

aistuff · October 8, 2024, 12:14am

I’m gonna try concatenating the deltas before the arraybuffer conversion and see what happens. Surprised it’s so hard to find a working example of this

mitchell_d00 · October 8, 2024, 12:45am

Basically, you’re trying to get these “audio deltas” (little chunks of sound) sent by the API to actually play on the browser, right? The issue you’re hitting comes from the fact that the audio isn’t in a format that your browser can decode easily, like a regular WAV or MP3. Instead, it’s probably in something more compressed, like Opus or Ogg, and the browser’s AudioContext.decodeAudioData() can’t handle that directly.

So the fix here is: you’ll need to decode the audio first before the browser can play it. There are libraries out there, like Opus.js or Aurora.js, that can help with this. They take the compressed audio and turn it into a format your browser can handle, like PCM (the basic audio data browsers use).

Once you’ve decoded it, you can pass the data to the AudioContext and it’ll play smoothly. It’s like you’re translating the audio into a language your browser understands.

In short:

Decode the audio delta using a library.
Play it with AudioContext after decoding.

I used to do OGG in IMVU I sold music rings.

aistuff · October 8, 2024, 12:57am

I have the audio_output_format set to ‘pcm16’. So shouldn’t i be able to to play this (once converted to ArrayBuffer) in audioContext with decodeAudioData(ArrayBuffer)?

Thanks

aistuff · October 8, 2024, 5:06pm

Hey, I got it working. The key for me was that neither decodeAudioData() or ‘audio-decode’ npm library can work with raw PCM. You need to wrap in a WAV container.

(audio delta from OpenAI) Base64 PCM16 → binary (String)… You can use atob()
Binary string → ArrayBuffer (ChatGPT can write you a function)
Create wavHeader ArrayBuffer (ChatGPT can write you a function… I used 24000 samplingRate)
Concat wavHeader ArrayBuffer + PCM16 ArrayBuffer created in step 2
The array buffer from step 4 is what I send via WebSocket to my client (make sure the ws ‘type’ of client connection is ‘arraybuffer’
Decode what I receive on the client (resulting ArrayBuffer from step 4)

Hope this helps. Let me know if you have issues

omedkane4 · October 9, 2024, 2:33am

You’re a genius
It works wonders ! thank you very much, I guess the wavHeader was the secret ingredient we didn’t know about !

timfosteman · October 9, 2024, 12:46pm

figured it out

to listen:

export async function startRealtimeListening() {
    try {
        // Start capturing audio
        const stream = await navigator.mediaDevices.getUserMedia({
            audio: {
                sampleRate: 16000,
                channelCount: 1,
                echoCancellation: true,
            },
        });

        const audioContext = new AudioContext({sampleRate: 16000});
        const source = audioContext.createMediaStreamSource(stream);
        const processor = audioContext.createScriptProcessor(4096, 1, 1);

        processor.onaudioprocess = (event) => {
            const audioData = event.inputBuffer.getChannelData(0);

            const int16Buffer = convertFloat32ToInt16(audioData);
            Game.realtimeClient.appendInputAudio(int16Buffer);  // Send audio data to RealtimeClient
        };

        source.connect(processor);
        processor.connect(audioContext.destination);
        Game.isListening = true;
    } catch (error) {
        console.error(error);
        Game.isListening = false;
    }
}

export function convertFloat32ToInt16(buffer) {
    let l = buffer.length;
    const buf = new Int16Array(l);
    while (l--) {
        buf[l] = Math.min(1, buffer[l]) * 0x7FFF;
    }
    return buf.buffer;
}

to speak:


class AudioQueueManager {
    audioQueue = [];
    isPlaying = false;
    pitchFactor = 0.5; // Default pitch factor for the voices. you can make it sound godlike if run at 0.2

    constructor() {
        makeAutoObservable(this);
    }

    // Function to set the pitch factor
    setPitchFactor(factor) {
        this.pitchFactor = factor;
    }

    // Function to add audio data to the queue
    addAudioToQueue(audioData) {
        this.audioQueue.push(audioData);
        this.playNext(); // Start playing if not already playing
    }

    // Function to play the next audio chunk in the queue
    async playNext() {
        if (this.isPlaying || this.audioQueue.length === 0) return;

        this.isPlaying = true;

        const audioData = this.audioQueue.shift(); // Get the next audio chunk
        await this.playAudio(audioData); // Play the audio

        this.isPlaying = false;
        this.playNext(); // Play the next audio in the queue
    }

    // Function to play a single audio chunk with pitch adjustment
    playAudio(audioBuffer) {
        return new Promise((resolve) => {
            // Create an AudioContext
            const audioContext = new (window.AudioContext || window.webkitAudioContext)();

            // Convert Int16Array (PCM16) to Float32Array
            const float32Array = new Float32Array(audioBuffer.length);
            for (let i = 0; i < audioBuffer.length; i++) {
                float32Array[i] = audioBuffer[i] / 0x7FFF; // Normalize to -1.0 to 1.0
            }

            // Create an AudioBuffer
            const audioBufferObj = audioContext.createBuffer(1, float32Array.length, audioContext.sampleRate);
            audioBufferObj.copyToChannel(float32Array, 0); // Copy PCM data to the buffer

            // Create a BufferSource to play the audio
            const source = audioContext.createBufferSource();
            source.buffer = audioBufferObj;
            source.playbackRate.value = this.pitchFactor; // Adjust pitch if necessary
            source.connect(audioContext.destination);

            source.onended = () => {
                resolve(); // Resolve when the playback ends
            };

            source.start(0); // Start playback
        });
    }
}

// Create an instance of the audio queue manager
const audioQueueManager = new AudioQueueManager();

// Function to handle the conversation update events
function handleConversationUpdated(event) {
    const { item, delta } = event;

    if (delta?.audio) {
        audioQueueManager.addAudioToQueue(delta.audio); // Add incoming audio chunk to the queue
    }
}

dharanidhar01.04 · October 11, 2024, 5:44am

Hi, Could you post your code?
I followed the instructions but was not able to get it working.
I’m using native ws implementation and picking up response.audio.delta event.

else if (data.type === "response.audio.delta") {
    audioChunks.push(data.delta);
    decodeAudio(data.delta);
}

maig · October 11, 2024, 3:55pm

I got it to play back, but there’s a constant clicking sound. Perhaps there’s a mismatched sample rate, but I have tried several things, and it’s still there.

Could you take a look at my code and see if you spot something? Thank you!

From the server I send

          const audioDelta = {
                            event: 'media',
                            streamSid: streamSid,
                            media: { payload: Buffer.from(response.delta, 'base64').toString('base64') }
                        };
                        connection.send(JSON.stringify(audioDelta));

Then on client:

 if (data.media.payload) {
              const audioData = new Uint8Array(atob(data.media.payload).split("").map(c => c.charCodeAt(0)));
              const wavData = createWavData(audioData, 24000);

              const init16Array = new Int16Array(wavData.buffer);

              audioQueue.push(init16Array)

              playNext();
            }

aistuff · October 11, 2024, 11:20pm

It looks like you’re trying to decode the Base64 encoded PCM16 sent to you by the Realtime API. I’d take another look at steps 1 - 4.

Decode from Base64 to binary string using atob()
Convert from binary string to array buffer
Create a wav header array buffer
Concatenate wavHeaderArrayBuffer + audioArrayBuffer

Decode the resulting ArrayBuffer from step 4

stippi · October 14, 2024, 3:10pm

Using a ScriptProcessor Node is deprecated.

I recommend cloning the OpenAI Console demo app. They have working audio streaming code in the libs folder.

vishal6 · December 6, 2024, 9:09pm

Can someone help me write a function to resample the audio from 24khz to 8hz PCM and vice versa. I am currently using ffmpeg library generated by ChatGPT but its not perfect. There is some latency and some noise.

There is a direct way too I am assuming.
@aistuff posted some steps which are helpful but someone have an implementation ?

Here is the code

 try {
    return new Promise((resolve, reject) => {
      const inputBuffer2 = Buffer.from(inputBuffer, 'base64');
      const inputStream = Readable.from(inputBuffer2);
      let outputBuffer = Buffer.alloc(0);
      const outputStream = new PassThrough();

      ffmpeg(inputStream)
        .inputFormat('s16le')
        .inputOptions([
          `-ar ${inputFrequency}`, // Dynamic input sample rate
          '-ac 1',
        ])
        .outputFormat('s16le')
        .audioFrequency(outputFrequency) // Dynamic output sample rate
        .audioChannels(1) // Mono output
        .on('error', (err) => {
          console.error('FFmpeg error:', err.message);
          reject(new Error(`FFmpeg error: ${err.message}`));
        })
        .on('end', () => {
          resolve(outputBuffer.toString('base64'));
        })
        .writeToStream(outputStream);

      outputStream.on('data', (chunk) => {
        outputBuffer = Buffer.concat([outputBuffer, chunk]);
      });

      const timeout = setTimeout(() => {
        reject(new Error('Audio processing timeout'));
        outputStream.destroy();
      }, 60000);

      outputStream.on('end', () => clearTimeout(timeout));
    });
  }```

abhaysingh2402 · January 9, 2025, 2:47pm

Use sox, generated this code from claude.

const sox = require('sox-stream');
const { PassThrough } = require('stream');
const { Buffer } = require('buffer');

/**
 * Converts PCM audio with volume normalization
 * @param {string} base64String - Base64 encoded PCM audio data
 * @param {Object} options - Conversion options
 * @param {number} options.bits - Bit depth of input/output (default: 16)
 * @param {number} options.channels - Number of channels (default: 1)
 * @param {number} options.volumeAdjustment - Volume adjustment in dB (default: -3)
 * @returns {Promise<string>} Base64 encoded 48kHz PCM audio data
 */
async function resampleAudio(base64String, targetSampleRate, originalSampleRate, {
    bits = 16,
    channels = 1,
    volumeAdjustment = -3
} = {}) {
    if (!base64String || typeof base64String !== 'string') {
        throw new Error('Invalid input: base64String must be a non-empty string');
    }

    return new Promise((resolve, reject) => {
        try {
            // Decode base64 to Buffer
            const inputBuffer = Buffer.from(base64String, 'base64');

            // Create streams
            const inputStream = new PassThrough();
            const outputStream = new PassThrough();

            // Configure sox transform with volume adjustment and rate conversion
            const transform = sox({
                input: {
                    type: 'raw',
                    rate: originalSampleRate,
                    channels: channels,
                    bits: bits,
                    encoding: 'signed-integer',
                },
                output: {
                    type: 'raw',
                    rate: targetSampleRate,
                    channels: channels,
                    bits: bits,
                    encoding: 'signed-integer',
                },
                // Add volume adjustment before rate conversion
                effects: [
                    ['vol', `${volumeAdjustment}dB`],  // Reduce volume to prevent clipping
                    ['rate', '-v', '-L', `${targetSampleRate}`]      // High-quality rate conversion
                ]
            });

            // Set up error handlers
            const handleError = (err) => {
                // Ignore rate warning messages about clipping
                if (err.message && err.message.includes('sox WARN rate')) {
                    return;
                }
                cleanup();
                reject(new Error(`Sample rate conversion failed: ${err.message}`));
            };

            inputStream.on('error', handleError);
            transform.on('error', handleError);
            outputStream.on('error', handleError);

            // Collect output chunks
            const chunks = [];
            outputStream.on('data', chunk => chunks.push(chunk));

            outputStream.on('end', () => {
                try {
                    const resampledBuffer = Buffer.concat(chunks);
                    const outputBase64String = resampledBuffer.toString('base64');
                    cleanup();
                    resolve(outputBase64String);
                } catch (err) {
                    handleError(err);
                }
            });

            // Cleanup function to remove listeners
            const cleanup = () => {
                inputStream.removeAllListeners();
                transform.removeAllListeners();
                outputStream.removeAllListeners();
            };

            // Start the processing pipeline
            inputStream.end(inputBuffer);
            inputStream
                .pipe(transform)
                .pipe(outputStream);

        } catch (err) {
            reject(new Error(`Failed to initialize sample rate conversion: ${err.message}`));
        }
    });
}

Topic		Replies	Views
Realtime: Recording to a stream from browser API realtime	3	659	January 23, 2025
Noise Issue with TTS Conversion to Base64 API tts	3	104	November 25, 2024
Creating Readstream from Audio Buffer for Whisper API API whisper	8	5963	January 14, 2025
Speech & TTS-1 model corrupted PCM Data using stream API api	1	141	January 22, 2025
Low and slow audio from realtime API, how to properly audio format? API realtime , api-realtime , api-realtime-speech	7	3493	December 25, 2024

Playing audio in JS sent from realtime API

Related topics