I noticed that the TTS endpoint already appears in the api documentation (OpenAI Platform), but when trying to use it I received the following return: The model tts-1 does not exist or you do not have access to it.
DOES ANYONE KNOW HOW TO GET ACCESS TO THESE MODELS?
I’m hitting the same wall with the TTS models. Followed the docs to the letter but still getting the ‘model does not exist or access not allowed’ message. If you find a way to get this working or hear back from OpenAI on this, I’d love to get a heads up!
So excited about this but I feel like latency will still be an issue for a use case where you’re trying to have real-time conversations. Would be great if we could have the option of telling the chat completions endpoint to return audio instead of text. Judging by how fast this is all moving I’m sure that’s a few weeks away.
Right but it looks like we pass in the input text to be spoken to this new endpoint (which for this use case would be the output of an LLM). So, user finishes speaking, pass that input to chat completion, take result of chat completion and pass it to text to speech endpoint. That’s the latency I’m worried about. Anyways, definitely getting closer.
They did not say anything about languages supported. (Although bit disappointed), I understand if it is only English for a start, but maybe you should not pretend that the rest of the world does not exist
Yes I am having this issue where I am giving TTS-1 the stream from gpt-4 and it doesn’t seem to work well. It only works well if you pass an entire message to it and then stream the audio but again like you mentioned the latency of doing so is an issue here.
BTW do you know if you are streaming the audio, or actually getting it in full?
To me it seems that the API examples and the Python client lib currently only do download of the whole mp3 in full.
I added the parameter to do streaming for the speech generation in the lib, and I think it works in my version now. Is PR 724 in openai-python (apparently can’t put links here).
I can confirm the Http response includes the headers Transfer-Encoding: chunked. I was able to stream the response in Node. But the initial latency is close to 1 second, so not ideal.