N.B. This bug does not happen on every request. I found that if I request the word “Research” or “Academia” 10 times I would see the bug at least once.
A quick fix: put a period and space as the first characters of your request.
Then investigate: is the audio really clipped at the beginning, or is it a problem with the playback application not waiting until devices are ready. Open your audio file in an audio editor and insert silence at the beginning to see if you got what was requested and it is just not rendering properly.
My tests, like yours, suggest that this workaround is quite reliable (I’ve not seen it fail, yet). That said, I would not be surprised if it fails in some cases: in other contexts I’ve noticed that the “[pause]” string usually adds a pause—but not always.
I hope OpenAI will release a proper fix for this bug.