Possible to check API rate limit headers without burning a request?

swillison · November 17, 2023, 3:14pm

I’d like to tell my user how many requests they have left before they start running a bulk API task.

The best information on remaining rate limits is the information returned by the various x-ratelimit-limit-requests headers - but those headers are only available if you first run a request against the model in question.

Since gpt-4-vision-preview is currently limited to 100 requests a day I don’t want to have to make a call against it just to figure out what remaining request limit I can show my user!

Is there a way to get back those rate limit headers without spending a request? If not, could one be added?

_j · November 17, 2023, 4:18pm

Python? The fast answer:

apiresponse = client.chat.completions.with_raw_response.create(...

Then you can use apiresponse.headers.get('x-ratelimit-...') to get an element from a list of tuples.

Example dump of keys:


for i in apiresponse.headers:
    print(i)

    
date
content-type
transfer-encoding
connection
access-control-allow-origin
cache-control
openai-model
openai-organization
openai-processing-ms
openai-version
strict-transport-security
x-ratelimit-limit-requests
x-ratelimit-limit-tokens
x-ratelimit-limit-tokens_usage_based
x-ratelimit-remaining-requests
x-ratelimit-remaining-tokens
x-ratelimit-remaining-tokens_usage_based
x-ratelimit-reset-requests
x-ratelimit-reset-tokens
x-ratelimit-reset-tokens_usage_based
x-request-id
cf-cache-status
set-cookie
server
cf-ray
alt-svc

The rate limit, though, is being consumed at 2 requests per successful vision, and 1 even on failure, and has a strange reset policy, like you are back to 200 after a few hours after a handful based on tokens.

Then you have to extract the normal reply object within your response = apiresponse.parse() similar to normal object pydantic model operations for the new python, but that complicates things like streaming.

swillison · November 17, 2023, 4:22pm

That still burns a request in order to read the rate limit headers. I want to be able to read those rate limit headers without decrementing my available requests for the day by one (especially for vision).

_j · November 17, 2023, 5:18pm

That is not “burning” if you obtain header values at the same time as you get a vision request fulfilled.

It doesn’t get your current status hours or a day later, though.

swillison · November 17, 2023, 5:53pm

I want to show my user a message that says “You have 94 images left today, which resets in 14 hours” - before they have processed any images.

I would like access to those numbers without having to first send a request through the vision API to get them, which would burn one of those daily requests.

Topic		Replies	Views
How can we check rate limit openai api API api , rate-limit , limitations	6	4948	May 31, 2024
GPT-4 Vision - I need lots of requests API	2	2114	January 6, 2024
Rate Limits with Assistants API Feedback assistants-api	4	2197	December 3, 2023
The ChatCompletion response limit headers do not reflect previous request or token usage Bugs bug , api	1	1000	January 21, 2024
Other methods to extract raw response API	4	300	September 8, 2024

Possible to check API rate limit headers without burning a request?

Related Topics