You report a 60-133 token-per-second generation rate on gpt-4o, inclusive of latency and input processing. That alias is gpt-4o-2024-08-06 (vs two other gpt-4o models to be tried) - on CHAT Completions.
That is quite normal and actually pretty impressive.
If you want to double your costs, you can use the “service_tier”:“priority” API parameter, to stay on the high end, a bulk service level >90TPS.