The only OpenAI solution to language token generation speed is to move the customer-facing AI to gpt-3.5-turbo.
You can see the recent improvement in completion time of 250 tokens of GPT-4 (top blue) that corresponds almost exactly with the load reduction of “GPT-4 no longer making long outputs” complaints.