New model, tts-2, any news on it? (new voice mode)

anon25271712 · September 25, 2024, 6:35am

Hey guys, I found it weird that openai blog wasn’t updated with the release of the new voice mode… but it seems the new voice mode has been released to everyone at this point!

I was wondering if someone has already found a way to use it in the API or if there is any news on it… I tried using tts-2 as the model but got the good old: 'model_not_found'

anon25271712 · September 25, 2024, 6:40am

I’ve also tried using gpt-4o as the model… no luck either

Foxalabs · September 25, 2024, 10:21am

An API version is planned, but no timeline of when it will be released yet.

anon25271712 · September 25, 2024, 10:33am

thank you for sharing this @foxlabs! any news on if there will be API changes or if will be just a model change? I’m also curious if it will be voice2voice or if there will be an option to text2speech?

Foxalabs · September 25, 2024, 1:56pm

Me too! I asked the question last week and was told there is an API in the pipeline, it’s still early in the dev cycle and not much more than that.

I did mention that there would be a large demand for a voice to voice endpoint as well as TTS/STT it may be that voice to voice ends up being two calls, unknown at this point in time.

i0 · October 1, 2024, 3:19pm

Can’t wait for tts-2 to be released! Would be great if the new model is based on Advanced Voice mode. That voice is so natural.

logan1155 · October 20, 2024, 1:02am

I was a bit surprised this wasn’t part of demo day '24 since they already have the much improved voice model in the advanced voice mode and realtime api. I assume the TTS endpoints will continue to be updated going forward? If the newer models are truly omni then I suppose voice/audio is an output but the TTS endpoint still seems to the make the most sense for certain applications.

_j · October 20, 2024, 2:14am

“tts-2” is an imagining with no basis in fact, except that the name tts-1 allows room for it.

Demonstration of multimodal gpt-4o, aka “advanced voice mode”, is likely what sparked this discussion.

API Chat with audio is now available, via the chat completion endpoint. BUT: gpt-4o is talking to an AI, not just generation of an audio stream from input text.

Audio/speech

tts-1

$0.15 / 10K input characters

Language AI model

gpt-4o-audio-preview-2024-10-01

$2.00 / 10K output tokens

Prompting an AI model “repeat this back with your voice” as an unreliable form of text-to-speech, with versions of the same voices, is not a viable proposition for transcribing documents to audio.

segerenj · January 21, 2025, 12:08am

This is sorely needed! The current TTS API sounds awful, compared to say even the “Read Aloud” in ChatGPT app. Why can’t that just be made available for everyone, the same way it’s being used in ChatGPT via the button?

logan1155 · February 21, 2025, 5:37am

So this is a model that i used a lot for my app docent, it’s an ai podcast app. So if you use the tts-1 endpoint you may have noticed that they are making updates behind the scenes but are not releasing a tts-2, at least not yet. They added 3 new voices to the api very recently, maybe the last couple weeks. To my ear, the audio quality is also different/better. It used to be very monotone. It definitely feels more expressive to me now, especially the new voices.

My guess is that openai is reserving this endpoint as a sort of dedicated feature. You can generate audio output using some of the newer multi-modal models, but using the tts endpoint is much cheaper. Not quite sure why this is but I’d image it’s simplified some way behind the scenes. I’m assuming we will see sequential improvements to tts-1 that go under the radar. if they do release a tts-2 i’d suspect it will be because it allows for things like multi speaker audio (google has some new api’s that support this or greater control over emotion or things like that)

Topic		Replies	Views
Any plans to add new voices on TTS API? API tts	9	1001	November 20, 2024
Will the API for the New Voice Be Released Separately? API	4	3080	September 3, 2024
GPT-4o text to speech and speech to text API	19	19374	September 30, 2024
Speech-to-Speech (Audio Input/Output) with 4o API	5	1489	October 13, 2024
Advanced Voice Mode for API API	22	20357	October 5, 2024

New model, tts-2, any news on it? (new voice mode)

Audio/speech

Language AI model

Related topics