Hi,
OpenAI sends response headers with every chatgpt completion it sends back.
According to the doc here https://platform.openai.com/docs/guides/rate-limits/rate-limits-in-headers
You can use these headers to determine how many prompt tokens are remaining for the day, or how many requests are remaining for the current minute.
The problem is that this count keeps resetting to the maximum amount of requests or tokens minus 1 or minus prompt tokens used for the current request.
So for example:
Say i have a prompt that translates to 50 tokens. My daily allowed token usage is 500000. My Request per minute is 5000.
When i send the prompt i’ll get the value 4999 for the header x-ratelimit-remaining-requests and 499950 remaining tokens for the header x-ratelimit-limit-tokens. But when i send another request immediately after, i’ll get the same values back. Instead of 4998 requests left it will return 4999 again, and if the token usage for the second prompt is also 50 tokens the response header for that metric will also be 499950 again, instead of 499900.
Clearly the OpenAI api does not keep a count of how many requests you’ve actually sent, so these headers are absolutely usesless.