Any plans for releasing an API for TTS?

,

It is rolling now the new voice conversations, and it seem to be able to speak in different languages.

Is there a plan for making it available through the API?

I know there are other TTS solutions out there, but openai has been able to provide reasonable prices overall, and it would be useful for platforms that don’t have a native TTS engine.

2 Likes

Hi and welcome to the Developer Forum!

Nothing official has been announces yet, I would imagine that if it is going to become an API then it may come with an official API for sound/images announcement, again, no timelines yet.

Welcome to the community!

OpenAI currently only offer speech-to-text (STT), but we as users hope to have text-to-speech (TTS) capabilities by 2024.In the meantime, you can use another TTS service, such as Elevenlabs, which offers great TTS capabilities.

Who is this “we” that you mention here?


it’s coming!

1 Like

Yes we knew that, thanks for the update :+1:

1 Like

Can I use it right now please? Currently I have:
file:///C:/My-progs/Node.JS/tales/server/node_modules/openai/error.mjs:57
return new RateLimitError(status, error, message, headers);
^

RateLimitError: 429 You exceeded your current quota, please check your plan and billing details.
at APIError.generate (file:///C:/My-progs/Node.JS/tales/server/node_modules/openai/error.mjs:57:20)

It will be rolled out “soon” if I can get more details I’ll update

Yeah, today’s keynote was incredible. I am very happy that we now have a model with more reasonable prices, I hope that the rest of the market follows openai’s lead on this.

1 Like

Ok now it works!! Not sure was a problem with key or just quota was activated for my account :slight_smile:
Also, all languages seems to be supported and automtically detected based on string. Great improvement over Google TTS. Price is also on par (they have $16 for 1M chars in Neural2 models)

How does it work in other languages then English? Does it sound natural, or giving you a funny American accent voice?

This is quite funny. Russian is little bit (really almost unnoticeable) accent with some letters. But it is better than Google ones, I would say

See example here: https://mangatv.shop/api/video/B7zROoMrVxgfhx_rxdl2u.mp4

Italian TTS is quite good with a little british accent.

Hm, the link you shared is in English?

How can I use TTS API? I am following the OpenAI doucumentation : OpenAI Platform

Will you send me a simple working snippet code? With the code in documentations I am getting errors.

Yep. Because it is hard to spot accent actually. Let me know which language is interesting for you, I will generate

Like this (NodeJS):

import fs from “fs”;
import util from “util”;
import textToSpeech from “@google-cloud/text-to-speech”;
import { openai } from “./index.js”;

const tts = new textToSpeech.TextToSpeechClient();
const OpenAiVoices = [“alloy”, “echo”, “fable”, “onyx”, “nova”, “shimmer”];

export const getAudio = async (text, lang, voice) => {
const audio = OpenAiVoices.includes(voice)
? await getOpenAiAudio(text, voice)
: await getGoogleAudio(text, lang, voice);
const writeFile = util.promisify(fs.writeFile);
const audioName = nanoid();
await writeFile(./media/audio/${audioName}.mp3, audio, “binary”);
return {
url: /audio/${audioName}.mp3,
duration: await getAudioDurationInSeconds(./media/audio/${audioName}.mp3),
};
};

const getOpenAiAudio = async (text, voice) => {
const mp3 = await openai.audio.speech.create({
model: “tts-1”,
voice: voice,
input: text,
});
return Buffer.from(await mp3.arrayBuffer());
};

const getGoogleAudio = async (text, lang, voice) => {
const request = {
input: { text: text },
voice: { languageCode: lang, name: voice },
audioConfig: { audioEncoding: “MP3” },
};
const [response] = await tts.synthesizeSpeech(request);
return response.audioContent;
};

No, I thought you have produced an audio spoken in Russian, via OpenAI’s TTS?

You want to use OpenAI TTS? Which language? for Python try running this below code or any other language

python :snake: Code

Basically, I am looking for information on what languages are supported. There is no word about it in the docs, nothing was mentioned yesterday in the speech. It is probably some of:

a) only English is supported, and they are pretending other languages don’t exist
b) everything is supported perfectly and they did not find it important to mention
c) everything kind-of works, but all speech is produced with some English speaker, producing awkward results for non-English speech