I just now tested with:
Model: gpt-4o-mini-tts
Voice: Marin
Instructions: Voice: Deep, hushed, and enigmatic, with a slow, deliberate cadence that draws the listener. Phrasing: Sentences are short and rhythmic, building tension with pauses and carefully placed suspense. Punctuation: Dramatic pauses, ellipses, and abrupt stops enhance the feeling of unease and anticipation. Tone: Dark, ominous, and foreboding, evoking a sense of mystery and the unknown.
You are right. The instructions were not properly followed - resulted in a rather bland mood.
EDIT: Just tested with a completely different voice instruction and the result was not much different from the first.
This is very disappointing. I don’t use snapshot models.