How to get rate limit reset time for response API?

In completion API, I can get this from response headers:

x-ratelimit-limit-requests: 5000
x-ratelimit-limit-tokens: 160000
x-ratelimit-limit-tokens_usage_based: 160000
x-ratelimit-remaining-requests: 4999
x-ratelimit-remaining-tokens: 159976
x-ratelimit-remaining-tokens_usage_based: 159976
x-ratelimit-reset-requests: 12ms
x-ratelimit-reset-tokens: 9ms
x-ratelimit-reset-tokens_usage_based: 9ms

But I couldn’t found it on response API response headers.
How to get x-ratelimit-reset-requests for response API?

If you are making the http calls yourself, the headers are the same.

If you are using responses via a Python SDK, for example, also quite similar:

try:
    response = await client.responses.with_raw_response.create(..)

Then response itself is not a typical pydantic model response. It has higher-level methods.

        print("headers")
        print(response.headers)

gives a (annoying) list of tuples:

Headers([('date', 'Mon, 26 May 2025 05:00:04 GMT'), ('content-type', 'application/json'), ('transfer-encoding', 'chunked'), ('connection', 'keep-alive'), ('x-ratelimit-limit-requests', '30000'), ('x-ratelimit-limit-tokens', '150000000'),

And then continue to get your AI output. Maybe we just want to convert to json string for display after converting to a recursively unpacked dict:

    import json
    print(json.dumps(response.parse().model_dump(),indent=2))

BTW: reset-requests is not the amount of time you have to wait. It is the amount of time before the state would completely reset with no history of you making calls at all.

1 Like