Noise Issue with TTS Conversion to Base64

seff · November 25, 2024, 9:25am

I’m experiencing an issue with my code for TTS generation. While the TTS output itself seems fine when I check it, the resulting audio after processing ends up as noise. I suspect the problem lies in the conversion of the TTS output to Base64 format:

async generate(gptReply, interactionCount) {
    const { partialResponseIndex, partialResponse } = gptReply;

    if (!partialResponse) { return; }

    try {
        const response = await fetch(
            'https://api.openai.com/v1/audio/speech',
            {
                method: 'POST',
                headers: {
                    'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify({
                    input: partialResponse,
                    model: 'tts-1',
                    voice: 'alloy',
                    response_format: 'opus',
                }),
            }
        );

        if (response.status === 200) {
            try {
                const audioArrayBuffer = await response.arrayBuffer();
                const base64String = Buffer.from(audioArrayBuffer).toString('base64');
                this.emit('speech', partialResponseIndex, base64String, partialResponse, interactionCount);
            } catch (err) {
                console.error('Error converting audio to Base64:', err);
            }
        } else {
            console.error('OpenAI TTS error:', response);
        }
    } catch (err) {
        console.error('Error occurred in TextToSpeech service:', err);
    }
}

I would appreciate any advice or insights that can help resolve this issue.

aaron.lutz · November 25, 2024, 10:07am

Hi,

Did you set the right parameters of the codec for conversion to base 64?
If I remember correctly (only of the top off my head rn) I had a similar issue when converting due to the codec: raw 16 bit PCM audio at 24kHz, 1 channel, little-endian.

Let me know if this helps. If not, I can go and check in my code.

seff · November 25, 2024, 10:30am

Thx, I tried it but couldn’t get it to work. If possible, I’d appreciate it if you could check your code.

_j · November 25, 2024, 10:35am

You’re requesting a file in opus format.

Save it binary.

Name it speech.ogg

Topic		Replies	Views
Text-to-speech returning jibberish audio API tts	3	1342	December 13, 2023
Playing audio in JS sent from realtime API API realtime	13	5407	January 9, 2025
Low and slow audio from realtime API, how to properly audio format? API realtime , api-realtime , api-realtime-speech	7	3269	December 25, 2024
Creating Readstream from Audio Buffer for Whisper API API whisper	8	5841	January 14, 2025
Speech & TTS-1 model corrupted PCM Data using stream API api	1	132	January 22, 2025

Noise Issue with TTS Conversion to Base64

Related topics