I am passing the correct speed value to the API but it is being ignored. The voices always play back at the default speed regardless of which speed value is specified. This appears to only be an issue with the gpt-4o-mini-tts and not tts-1
Hi. Did you get this issue resolved? I’m also working with gpt-4o-mini-tts API and unable to configure the desired speed, even though the speed value is correctly passed to the API in the request payload.
I can confirm that I was also able to reproduce this on my side. Thank you @bret1 and @sharjeelfaiq for reporting this issue.
Do you have any idea how soon the issue is expected to get resolved?
I think the new model just works differently, it’s not a bug.
Actually, the tts-1 model has a bug that creates a slower version by losing bitrate (just streches).
The new model will actually generate the speech as a person talking slower or faster.
For example, input the speed in the instructions: “speak very very fast” or “speak a bit slow and paused”. Then, it will generate the speech with a normal quality regardless of the speed.
I actually liked a lot this new improvement, as the old model would sound strange and metallic when speed was changed from normal.
Yes but it would be nice if the speed values as documented in the API would still be supported (perhaps translated to a prompt behind the scenes). This would maintain compatibility with existing TTS apps & wouldn’t require a paradigm shift.
And regardless, I did try experimenting with changing the speed via prompt and it was very inconsistent. It didn’t always work. Another issue with the new voices is that they can sound like completely different speakers from one API call to the next.
Certainly, you are right on that. I guess they should change the docs to reflect that.
Meanwhile, I think it would have to be done programatically inside the app.
I tried that. I was either prompting them wrong or the model doesn’t follow instructions well. It was very inconsistent. And it also seemed to increase the likelihood of the voices sounding like different speakers between API calls.
Yeah, I noticed that too, and sometimes there is still some strange pauses. There a lot of room for improvements.
To my use cases that are not that much demanding, it was overall a good update.
But if you need consistency, the tts-1 model is still better.
Hopefully they will provide us a fix soon.
Thanks for raising this! I’ve flagged this to the team to look into. We’ll update back once we have more info.
Update:
The speed
parameter is not supported for gpt-4o-mini-tts
currently. This was a bug in our documentation which has been updated. Thanks again for flagging this!
You’re welcome. What about the inconsistency of the voice between API calls? And it’s inconsistent direction following in terms of pitch and speed? I wanted to offer the new voices in my TTS app but cannot due to the inconsistency, which is unfortunate because I’m sure they’re great in other respects.
Without disregarding the issue, I think adding more instructions makes the output a little more stable, as seen on openai.fm