We are getting a flood of errors like this:
response: {
status: 429,
statusText: 'Too Many Requests',
headers: {
'x-ratelimit-limit-requests': '3500',
'x-ratelimit-limit-tokens': '90000',
'x-ratelimit-remaining-requests': '3251',
'x-ratelimit-remaining-tokens': '3964',
'x-ratelimit-reset-requests': '4.253s',
'x-ratelimit-reset-tokens': '57.357s',
},
We then check our usage for that period in the usage tab and find (for the entire hour):
8:25 PM
gpt-3.5-turbo-0301, 25 requests
4,832 prompt + 853 completion = 5,685 tokens
8:30 PM
gpt-3.5-turbo-0301, 4 requests
1,880 prompt + 298 completion = 2,178 tokens
8:35 PM
gpt-3.5-turbo-0301, 1 request
562 prompt + 92 completion = 654 tokens
8:45 PM
gpt-3.5-turbo-0301, 42 requests
8,012 prompt + 1,611 completion = 9,623 tokens
8:50 PM
gpt-3.5-turbo-0301, 49 requests
9,385 prompt + 1,736 completion = 11,121 tokens
8:55 PM
gpt-3.5-turbo-0301, 68 requests
12,612 prompt + 2,874 completion = 15,486 tokens
In other words, we are nowhere near the 90k token limit for the entire HOUR, let alone a 1 minute period.
Anyone have any clues? We’re banging our heads here, wondering if OpenAI somehow has a more granular second-based rate limit? (e.g., 9000 / 60 = 1500 tokens max per second??)