Gpt-4o-mini-tts speed and unnatural voice

The new model snapshot clearly improves word error rates. I shared evaluation results in the topic linked below and can confirm it from my own testing. It would be perfect if it were equally steerable.