New Audio Model Snapshots in the Realtime-API

VeitB · January 16, 2026, 5:51am

Guiding the new gpt-4o-mini-tts-2025-12-15 snapshot behaves differently from the previous gpt-4o-mini-tts-2025-03-20 version.

Goal
Control the style and tone of text-to-speech output, for example a whispering voice.

Challenge
With gpt-4o-mini-tts-2025-03-20, a simple prompt like:

You are always whispering

worked reliably in most cases. With the new snapshot, the same instruction is followed far less consistently, closer to three out of ten attempts. To benefit from the improved, lower word error rate of the new snapshot, the prompting approach needs to change.

Solution
The team shared the Realtime Prompting Guide from cookbook.openai.com. The key takeaway is that the model needs to be guided similarly to realtime models when enforcing style and tone constraints. Here is an example prompt as baseline guidance, and this optimizer prompt can be used to remove ambiguity from the wording.

My experience
I struggle with this and can get proper instruction following around 50% of the time. For now I second @Dobo 's approach to use the older snapshot when precise style and tone control are needed. Maybe we should create a topic in the prompting category to learn where others can take this model with their prompting skills.

@aprendendo.next: FYI: The default snapshot for this model has been updated to the newer version. In case others are wondering why their output did suddenly change.

Topic		Replies	Views
🚀 gpt-realtime-1.5 is live in Realtime API API voice , realtime-api	17	5235	March 27, 2026
Gpt-4o-mini-tts-2025-12-15 still truncates final sentences; 2025-03-20 is being deprecated Bugs	6	373	June 6, 2026
Realtime regression in non-English production voice agents: gpt-realtime-mini vs gpt-realtime-mini-2025-10-06 Deprecations api-realtime	10	390	June 2, 2026
[Realtime API] Audio is randomly cutting off at the end Bugs realtime , api-realtime	87	8440	April 9, 2026
New audio models in the API + tools for voice agents Announcements	27	6748	July 13, 2025

New Audio Model Snapshots in the Realtime-API

Related topics