Avoiding throttling during peak hours?

carlson · September 22, 2025, 3:09am

Thanks for the help @_j. At this point, >99% of my input tokens in each call are hitting the cache, which is awesome because it’s 90% cheaper and quite a bit faster than non-cached input. I’m sticking with 5 concurrent calls at runtime because this program will run 1-2 times per minute on average, and once you get up near 15 requests per minute, you risk getting routed to a different server that doesn’t have the cache stored.

The only other thing that would make a meaningful impact for me is getting data on server strain at different times of the week, so that I can simply avoid doing research/engineering when the server is overloaded. While OpenAI doesn’t make this data public, I did find a post you made last year benchmarking performance throughout the week. Are you open to sharing how you gathered this data? I’d like to collect some more recent data (perhaps a real-time dashboard?) and make it publicly available so that others don’t face the same issue. I imagine this would help a lot of people because the official API status dashboard provides info on outages, but not latency. If interested, we can collaborate

Topic		Replies	Views
Seeking feedback on our "unofficial OpenAI status dashboard" site Community api	18	4735	October 19, 2025
Is Anyone Getting Slow Response or Internal Server Error? API	5	243	October 19, 2025
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	12	9726	July 22, 2024
Extremely slow API responses and hanging API	12	549	September 15, 2025
GPT-4o-2024–08–06 slower then previous version API gpt-4o	9	1278	January 7, 2025

Avoiding throttling during peak hours?

Related topics