This may be a stupid question: Why have snapshots on these models? Why not just upgrade gpt-4o-mini-transcribe and gpt-4o-mini-tts and gpt-realtime-mini to newer versions without snapshot naming conventions?
So now you have:
gpt-4o-mini-transcribe-2025-12-15 and gpt-4o-mini-transcribe
gpt-4o-mini-tts-2025-12-15 and gpt-4o-mini-tts
gpt-realtime-mini-2025-12-15 and gpt-realtime-mini
We would love to roll-out new versions of gpt-4o-mini-transcribe and gpt-4o-mini-tts to our customers, but not with snapshot naming conventions. Why? Because it seems so temporary.
I may be missing some context here, so my first question is whether this strategy of always upgrading to the latest snapshot, for example by using a generic slug like gpt-4o-mini-tts, is the same approach you would prefer for other model types as well?
Regardless of benchmark improvements, we typically evaluate new models against real production use cases before rolling them out to users and customers?
I’m most likely missing something. Feel free to point me in the right direction.
After a model has been rolled out, vetted and become “recommended”, the convenience alias may be pointed to that newer model. Or may never, as in the case of gpt-4o never being pointed to gpt-4o-2024-11-20 (newer than gpt-4o-2024-08-06).
If you want stability in applications, then you will use the versioned model when offered. Then you will test the new model like this release yourself, make adjustments needed to the prompting and parameters, and move to the latest version when you choose.
Do not let OpenAI decide when to switch on you and switch the behavior of your application by employing the alias. Got it?
I’m always happy to see new audio models, particularly TTS.
Apparently the new snapshot is having more trouble following instructions in comparison to the previous one (like “speak slower” or “[laughs]”), so it seems this might require more extensive tests to see what pros and cons it brings.
There is a huge disconnect between what OpenAI claimed and how this new smaller model is performing. 50% times only it follows instructions/call tools.
We have recently upgraded the tts / speech models in our app (Onsen - AI for Mental Health) to the new voice models.
We’ve received some complaints from users that the new speech model is not as emotive as the previous snapshot from March. I’ve done some testing and indeed the new speech model seems to completely ignore the “instructions” parameter which we use to provide a customer “voice personality” for our AI guides.
Can you comment if this is a bug, a regression or an intentional reduction of feature for the new speech model? It will be great if this is documented as I could not find any information in the official docs or the official blog post.
To me, this looks sounds like either a case where prompt tuning is needed because your users are suddenly experiencing a loss of emotion, or a situation where users have grown accustomed to the previous voice or model and simply do not like the new version.
I hope this helps!
Feel free to start a new topic in the Prompting category, mainly for learning purposes rather than to share any secrets.
Hi @vb, could you share the code or API request you used to generate these examples?
I am not able to replicate this on my end. I’ve recorded a short video where I contrast the older and new models using the OpenAI playground and it is very evident that the new model does not seem to follow the instructions at all.
I thought I might be using the API wrong, but the issue seems to affect the OpenAI playground too based on the video.
Hope you can take a look and let me know your thoughts.
The 2025-03 snapshot is pretty good at following instructions, but there were several complains about quality and unstable results, making different outputs sound like totally different voices. This was particularly ok to me, but it seemed to bother a lot of people.
The 2025-12 snapshot seems to improve voice quality and stability, under the cost of losing a lot of steerability from instructions. Also, the fact that the new model was not made the default when calling the alias gpt-4o-mini-tts reinforces that openai knows it, and took a cautious step to prevent apps from breaking, which is appreciated.
So, each version has its own pros and cons. Considering audio models haven’t been the highlight of all the AI hype at the end of 2025, I still consider a win that they actually released something.
Worst case scenario, we didn’t lose anything, the old model is still there and the new one does help people who just want a model with a higher quality and default settings.
In due time I hope they will manage to put things together in the next release, so that we can have both quality and good instruction following.
We ended rolling out a new version in the app that added support for both text-to-speech models.
For now we use the March snapshot for “Expressive” voices and the December snapshot for “Standard” voices - and we give the choice to the user to decide. See below:
I’ve reached out to the team to see if they can share any tips or best practices for getting gpt-4o-mini-tts-2025-12-15 to follow instructions more reliably and consistently.