Other methods to extract raw response

_Taki224 · September 8, 2024, 10:49am

Hi Everyone!

I am building an application that sends many requests to the models. It would be very useful for me to be able to see the remaining tokens, requests and the reset times. I found in the docs that, these informations are only accessible from the http headers.

So far I was able to extract these headers by asking for a raw response with this method:

response = client.chat.completions.with_raw_response.create(

My problem is that it is only available with the OpenAI library.
For advanced features I tried to implement solutions from the instructor library, but I couldn’t find a way to extract these header informations there, since raw http response cannot be accessed.

Because of the same reason I cannot use LangChain either, since raw http response is not supported there either.

If someone could help me with this issue, I would greatly appreciate it!

_j · September 8, 2024, 12:28pm

RESTful requests can be made to the API by any client supporting HTTPS with modern ciphers.

I see you are using Python and OpenAI, where one of the prerequisites is the httpx module, a drop-in replacement for the requests module. I will add header extraction to existing code I have posted on the forum before, using the documentation from rate limiting and AI knowledge of the module.

Also required: not all x-header values are integers, but time, so let’s make them programmatically accessible as float. Converting the remaining rate headers into integers is also required.

Here is the modified code that will extract the headers starting with ‘x-’, load them into a dictionary, and print them after the response. It also includes the conversion of time values into seconds.

It uses best practices, in retrieving an API key from environment variables.

import os, httpx, json, re

apikey = os.environ.get("OPENAI_API_KEY")
url = "https://api.openai.com/v1/chat/completions"
headers = {
    "OpenAI-Beta": "assistants=v2",
    "Authorization": f"Bearer {apikey}"
}
body = {
    "model": "gpt-4o-2024-08-06", "max_tokens": 25, "top_p": 0.8,
    "messages": [
        {"role": "user",
         "content": [{"type": "text", "text": "Hello robot"}]
        }
    ]}

try:
    response = httpx.post(url, headers=headers, json=body)
    response_json = response.json()
    print(json.dumps(response_json, indent=3))

    # Extract headers starting with 'x-' and load them into a dictionary
    x_headers = {k: v for k, v in response.headers.items() if k.lower().startswith('x-')}

    # Convert time values into seconds
    time_multipliers = {'h': 3600, 'm': 60, 's': 1, 'ms': 0.001}
    rate_headers = ['x-ratelimit-limit-requests', 'x-ratelimit-limit-tokens', 
                    'x-ratelimit-remaining-requests', 'x-ratelimit-remaining-tokens', 
                    'x-ratelimit-reset-requests', 'x-ratelimit-reset-tokens']
    for key in rate_headers:
        if key in x_headers:
            if 'reset' in key:
                total_time = 0
                for time_part in re.findall(r'(\d+)([hms]+)', x_headers[key]):
                    total_time += int(time_part[0]) * time_multipliers[time_part[1]]
                x_headers[key] = total_time
            else:
                x_headers[key] = int(x_headers[key])

    # Print the headers
    print("\nHeaders starting with 'x-':")
    for key, value in x_headers.items():
        print(f"{key}: {value}")

except Exception as e:
    print(e)
    raise

Executing the code will print the dictionary-converted response you can parse.

{
   "id": "chatcmpl-djfaoijdfojad",
   "object": "chat.completion",
   "created": 1725797974,
   "model": "gpt-4o-2024-08-06",
   "choices": [
      {
         "index": 0,
         "message": {
            "role": "assistant",
            "content": "Hello! How can I assist you today?",
            "refusal": null
         },
         "logprobs": null,
         "finish_reason": "stop"
      }
   ],
   "usage": {
      "prompt_tokens": 9,
      "completion_tokens": 9,
      "total_tokens": 18
   },
   "system_fingerprint": "fp_8e1177b306"
}

This code will print the headers starting with ‘x-’ after the response. The headers that contain time values (those containing ‘reset’) are converted into seconds. The time values are assumed to be in the format ‘5s’ or ‘6m0s’, where ‘s’ stands for seconds and ‘m’ stands for minutes, and we get undocumented ms also.

The header values printed from the dictionary in which they are stored:

Headers starting with 'x-':
x-ratelimit-limit-requests: 10000
x-ratelimit-limit-tokens: 30000000
x-ratelimit-remaining-requests: 9999
x-ratelimit-remaining-tokens: 29999971
x-ratelimit-reset-requests: 0.006
x-ratelimit-reset-tokens: 0
x-request-id: req_c119113188c3b6bfad56b452b8a75a4f
x-content-type-options: nosniff

(your values may be lower and actually impacted by one request)

I hope this example code using the httpx library for Python in a standard and expected way demonstrates how to form dictionary kwarg parameter requests to be sent to OpenAI API, and then obtain the additional metadata headers on which you can take rate limit action, showing how you can break free from non-portable propretary input-validating libraries that can break with just one API parameter change.

_Taki224 · September 8, 2024, 12:50pm

Thank you @_j !

This is also a solution I considered earlier, but getting the headers is not my main problem.

I am trying to use other libraries that are built around the original OpenAI library, such as instructor and pydantic.

Just to mention an example with the pydantic library it is possible to get structured json responses and do validation with automatic retry.

These libraries (including LangChain also) does not have the option to get raw http responses.

I am looking for a way to pass an argument to the Client or the model, for example, to get the “rate” headers.

_j · September 8, 2024, 1:11pm

You have already found the only way it is presented to you by the OpenAI SDK, returning the headers as an unprocessed entity when you use the raw_response method (in a “legacy” section of the code), which also forces different response parsing.

The pydantic model schema is strict. If you really want to tear apart the openai library beyond what could be subclassed, you could use additional_parameters of the response return object to transmit to you metadata that you scrape out within the appropriate part of its code. Changes immediately lost with a pip --upgrade.

_Taki224 · September 8, 2024, 1:26pm

That is what I was afraid of.

These metrics could be very useful, I don’t know why they make it so hard to access when the information is already there.

Anyways, thank you!

Topic		Replies	Views
Possible to check API rate limit headers without burning a request? Feedback api , gpt-4-vision	5	3261	May 28, 2024
Headers for rate limits for vanilla models and finetune models API api	1	222	May 6, 2024
X-ratelimit Headers Missing Bugs	3	65	September 18, 2024
Quality of response between gpt-4-1106-preview and gpt-4o API gpt-4 , openai , gpt-4o	14	397	September 11, 2024
Rate Limits with Assistants API Feedback assistants-api	4	2198	December 3, 2023

Other methods to extract raw response

Related topics