Are you streaming? Although I haven’t noticed it recently (and my fine tunes are nothing but tests not regularly used), there was a period where warming up a fine-tune model could be up to 15 seconds before the first token.
By the fifth second of nothing, you could mute and fire off a new parallel request and close() the one that loses the race to the first 50 tokens.
The tier you are in affects your token production rate (what OpenAI mistakenly called latency when they even mentioned the low-tier penalty), mainly tier-1.
You are in a rare club of fine-tuning gpt-4. Nothing is documented except that it is available and requires approval, even the price (the price which you can tell us about!)