Is there any documentation on how to actually play (in JS) the audio deltas sent from Realtime API?
I’m sending each audio delta to my frontend, then for each audio delta I’m converting to ArrayBuffer, and I’m trying decode with audioContext.decodeAudioData(arrayBuffer), but I’m getting an error saying it can’t decode.
I’m gonna try concatenating the deltas before the arraybuffer conversion and see what happens. Surprised it’s so hard to find a working example of this
Basically, you’re trying to get these “audio deltas” (little chunks of sound) sent by the API to actually play on the browser, right? The issue you’re hitting comes from the fact that the audio isn’t in a format that your browser can decode easily, like a regular WAV or MP3. Instead, it’s probably in something more compressed, like Opus or Ogg, and the browser’s AudioContext.decodeAudioData() can’t handle that directly.
So the fix here is: you’ll need to decode the audio first before the browser can play it. There are libraries out there, like Opus.js or Aurora.js, that can help with this. They take the compressed audio and turn it into a format your browser can handle, like PCM (the basic audio data browsers use).
Once you’ve decoded it, you can pass the data to the AudioContext and it’ll play smoothly. It’s like you’re translating the audio into a language your browser understands.
I have the audio_output_format set to ‘pcm16’. So shouldn’t i be able to to play this (once converted to ArrayBuffer) in audioContext with decodeAudioData(ArrayBuffer)?
Hey, I got it working. The key for me was that neither decodeAudioData() or ‘audio-decode’ npm library can work with raw PCM. You need to wrap in a WAV container.
(audio delta from OpenAI) Base64 PCM16 → binary (String)… You can use atob()
Binary string → ArrayBuffer (ChatGPT can write you a function)
Create wavHeader ArrayBuffer (ChatGPT can write you a function… I used 24000 samplingRate)
Concat wavHeader ArrayBuffer + PCM16 ArrayBuffer created in step 2
The array buffer from step 4 is what I send via WebSocket to my client (make sure the ws ‘type’ of client connection is ‘arraybuffer’
Decode what I receive on the client (resulting ArrayBuffer from step 4)
class AudioQueueManager {
audioQueue = [];
isPlaying = false;
pitchFactor = 0.5; // Default pitch factor for the voices. you can make it sound godlike if run at 0.2
constructor() {
makeAutoObservable(this);
}
// Function to set the pitch factor
setPitchFactor(factor) {
this.pitchFactor = factor;
}
// Function to add audio data to the queue
addAudioToQueue(audioData) {
this.audioQueue.push(audioData);
this.playNext(); // Start playing if not already playing
}
// Function to play the next audio chunk in the queue
async playNext() {
if (this.isPlaying || this.audioQueue.length === 0) return;
this.isPlaying = true;
const audioData = this.audioQueue.shift(); // Get the next audio chunk
await this.playAudio(audioData); // Play the audio
this.isPlaying = false;
this.playNext(); // Play the next audio in the queue
}
// Function to play a single audio chunk with pitch adjustment
playAudio(audioBuffer) {
return new Promise((resolve) => {
// Create an AudioContext
const audioContext = new (window.AudioContext || window.webkitAudioContext)();
// Convert Int16Array (PCM16) to Float32Array
const float32Array = new Float32Array(audioBuffer.length);
for (let i = 0; i < audioBuffer.length; i++) {
float32Array[i] = audioBuffer[i] / 0x7FFF; // Normalize to -1.0 to 1.0
}
// Create an AudioBuffer
const audioBufferObj = audioContext.createBuffer(1, float32Array.length, audioContext.sampleRate);
audioBufferObj.copyToChannel(float32Array, 0); // Copy PCM data to the buffer
// Create a BufferSource to play the audio
const source = audioContext.createBufferSource();
source.buffer = audioBufferObj;
source.playbackRate.value = this.pitchFactor; // Adjust pitch if necessary
source.connect(audioContext.destination);
source.onended = () => {
resolve(); // Resolve when the playback ends
};
source.start(0); // Start playback
});
}
}
// Create an instance of the audio queue manager
const audioQueueManager = new AudioQueueManager();
// Function to handle the conversation update events
function handleConversationUpdated(event) {
const { item, delta } = event;
if (delta?.audio) {
audioQueueManager.addAudioToQueue(delta.audio); // Add incoming audio chunk to the queue
}
}
Hi, Could you post your code?
I followed the instructions but was not able to get it working.
I’m using native ws implementation and picking up response.audio.delta event.
else if (data.type === "response.audio.delta") {
audioChunks.push(data.delta);
decodeAudio(data.delta);
}
I got it to play back, but there’s a constant clicking sound. Perhaps there’s a mismatched sample rate, but I have tried several things, and it’s still there.
Could you take a look at my code and see if you spot something? Thank you!
Can someone help me write a function to resample the audio from 24khz to 8hz PCM and vice versa. I am currently using ffmpeg library generated by ChatGPT but its not perfect. There is some latency and some noise.
There is a direct way too I am assuming. @aistuff posted some steps which are helpful but someone have an implementation ?