Hey guys, I found it weird that openai blog wasn’t updated with the release of the new voice mode… but it seems the new voice mode has been released to everyone at this point!
I was wondering if someone has already found a way to use it in the API or if there is any news on it… I tried using tts-2 as the model but got the good old: 'model_not_found'
thank you for sharing this @foxlabs! any news on if there will be API changes or if will be just a model change? I’m also curious if it will be voice2voice or if there will be an option to text2speech?
Me too! I asked the question last week and was told there is an API in the pipeline, it’s still early in the dev cycle and not much more than that.
I did mention that there would be a large demand for a voice to voice endpoint as well as TTS/STT it may be that voice to voice ends up being two calls, unknown at this point in time.
I was a bit surprised this wasn’t part of demo day '24 since they already have the much improved voice model in the advanced voice mode and realtime api. I assume the TTS endpoints will continue to be updated going forward? If the newer models are truly omni then I suppose voice/audio is an output but the TTS endpoint still seems to the make the most sense for certain applications.
“tts-2” is an imagining with no basis in fact, except that the name tts-1 allows room for it.
Demonstration of multimodal gpt-4o, aka “advanced voice mode”, is likely what sparked this discussion.
API Chat with audio is now available, via the chat completion endpoint. BUT: gpt-4o is talking to an AI, not just generation of an audio stream from input text.
Audio/speech
tts-1
$0.15 / 10K input characters
Language AI model
gpt-4o-audio-preview-2024-10-01
$2.00 / 10K output tokens
Prompting an AI model “repeat this back with your voice” as an unreliable form of text-to-speech, with versions of the same voices, is not a viable proposition for transcribing documents to audio.
This is sorely needed! The current TTS API sounds awful, compared to say even the “Read Aloud” in ChatGPT app. Why can’t that just be made available for everyone, the same way it’s being used in ChatGPT via the button?