Rate limiting strategies?

daveshapautomator · August 3, 2021, 12:39pm

The most common algorithm that I am aware of is “exponential backoff”. Basically you look for some kind of trigger or condition and you double (or triple) the amount of wait time between iterations until the trigger condition is alleviated.

One such condition could be error codes returned by the API, such as 429 “too many requests”.

Another condition could be your own rate limits. Here’s how I might implement it since there are multiple sources causing requests:

Use a single broker to handle all communication with OpenAI API
This broker handles all transactions from multiple sources
The broker keeps track of all requests (including local timestamp)
Use some global benchmarks (like 20 requests per minute max)
Track a rolling request rate (n requests over last 60 seconds)
As you approach that limit, increase delay until next request

Since you have multiple sources generating requests you can also keep track of the same information as above but for each individual requestor. Say each individual service is allowed 5 requests per minute, the broker will hold a queue and space out the requests according to the queue depth (if queue is >= 5, space will be 12 seconds, for example)

All of that being said, I think it’s overkill. Some of my experiments have several hundred CURIE completions per minute without issue, although these occur in bursts, not as a sustained rate. Maybe OpenAI hates users like me after all, we pay for tokens so they want to serve us as fast as possible.

Topic		Replies	Views
Getting 429 errors without hitting limits API	11	3322	December 18, 2023
Error 429 Too Many Requests when calling GPT-4 API Feedback gpt-4	1	38	August 30, 2025
Hitting Rate Limit with small group of Users? API api-rate-increase	14	6465	January 20, 2024
RateLimitErrors increased drastically in the last month? API gpt-4 , api	3	670	May 23, 2023
Error: 429 Too Many Requests API	56	14458	December 2, 2023

Rate limiting strategies?

Related topics