I’d like to tell my user how many requests they have left before they start running a bulk API task.
The best information on remaining rate limits is the information returned by the various
x-ratelimit-limit-requests headers - but those headers are only available if you first run a request against the model in question.
gpt-4-vision-preview is currently limited to 100 requests a day I don’t want to have to make a call against it just to figure out what remaining request limit I can show my user!
Is there a way to get back those rate limit headers without spending a request? If not, could one be added?
Python? The fast answer:
apiresponse = client.chat.completions.with_raw_response.create(...
Then you can use
apiresponse.headers.get('x-ratelimit-...') to get an element from a list of tuples.
Example dump of keys:
for i in apiresponse.headers:
The rate limit, though, is being consumed at 2 requests per successful vision, and 1 even on failure, and has a strange reset policy, like you are back to 200 after a few hours after a handful based on tokens.
Then you have to extract the normal reply object within your
response = apiresponse.parse() similar to normal object pydantic model operations for the new python, but that complicates things like streaming.
That still burns a request in order to read the rate limit headers. I want to be able to read those rate limit headers without decrementing my available requests for the day by one (especially for vision).
That is not “burning” if you obtain header values at the same time as you get a vision request fulfilled.
It doesn’t get your current status hours or a day later, though.
I want to show my user a message that says “You have 94 images left today, which resets in 14 hours” - before they have processed any images.
I would like access to those numbers without having to first send a request through the vision API to get them, which would burn one of those daily requests.