Text-to-speech returning jibberish audio

THIS_IS_D0G · December 13, 2023, 5:44am

I just started using the speech API and am converting text to speech using tts-1 (and tts-1-hd), and more often than not the audio it generates (or at least as it’s uploaded to GCS) is nonsensical. The text is in one of the supported languages (Greek) but it often returns total jibberish. This is a nodejs application, and I’m uploading the audio as an mp3 to Google Cloud Storage. Pardon the sloppy code, just trying to get it to work. Has anyone who has used the text to speech models experienced this before?

async textToSpeech(req: Request, res: Response) {
    console.log("Getting audio file from openAI");
    const mp3 = await openai.audio.speech.create({
        model: "tts-1-hd",
        voice: "echo",
        input: req.body.text,

    });
    const buffer = Buffer.from(await mp3.arrayBuffer());
    let bucketDestination = await this.cloudStorage.uploadAudioFile(buffer, req.body.text);
    console.log("Responding with " + bucketDestination);
    res.json(bucketDestination);
}

async uploadAudioFile(buffer: Buffer, text: string): Promise {
const hash = crypto.createHash(‘sha256’);
hash.update(text);
const hashText = hash.digest(‘hex’);
const destination = “text-to-speech/” + hashText + ‘.mp3’;
const bucket = this.storage.bucket(this.bucketName);
const file = bucket.file(destination);

    console.log("Attempting to upload file to " + this.bucketName + "/" + destination);
    const stream = file.createWriteStream({
        metadata: {
            contentType: 'audio/mpeg',
        }
    });

    stream.on('error', (err) => {
        console.error('Upload failed.', err);
        throw err;
    });

    stream.on('finish', () => {
        console.log(`${destination} uploaded to ${this.bucketName}.`);
    });

    stream.end(buffer);

    return this.bucketName + "/" + destination;
}

supershaneski · December 13, 2023, 7:26am

What do you mean jibberish? Like totally nonsensical? Have you checked if req.body.text is properly encoded when you passed it to the API?

In my case, I am using both English and Japanese text input generated by the Chat completions API directly and I have not encountered a similar result yet.

THIS_IS_D0G · December 13, 2023, 4:13pm

Ok I confirmed. When I do the tts via curl command and download the audio file directly from the google tts API, it is fine. Seems to be something going wrong with the upload itself or how GCS stores the file

THIS_IS_D0G · December 13, 2023, 4:44pm

Appears as though it has to do with using typescript - somehow the audio is getting corrupted when it’s converted into the Buffer object. When I use pure javascript it works fine. I’m a javascript/typescript noob so I’ll have to explore this some more, but at least I can get it working by just using raw js for now.

Topic		Replies	Views
Noise Issue with TTS Conversion to Base64 API tts	3	110	November 25, 2024
Creating Readstream from Audio Buffer for Whisper API API whisper	8	6010	January 14, 2025
Whisper spitting out gibberish when trying to transcribe API whisper	4	1085	June 14, 2024
RealTime API Transcription errors Bugs realtime	7	1650	January 9, 2025
MediaRecorder API w/ Whisper not working on mobile browsers API whisper , as-wiki	7	1930	December 20, 2024

Text-to-speech returning jibberish audio

Related topics