KeyError: "usage" for gpt-3.5-turbo-16k

sector373 · February 8, 2024, 9:56pm

I’ve recently been getting KeyErrors when using gpt-3.5-turbo-16k where the “usage” key is completely missing from the response object. What’s weird is that this happens inconsistently, without any changes to the code:

headers = {
        "Authorization": await get_azure_token(),
        "OCP-Apim-Subscription-key": OPENAI_KEYS[model],
    }

    # Payload contains everything we're passing to OpenAI and must
    #  conform to their API.

payload = {
        "messages": [message.model_dump() for message in messages],
        "temperature": kwargs.get("temperature", 1),
    }
async with httpx.AsyncClient(
        verify=False, follow_redirects=True, timeout=360
    ) as client:
# OPENAI_URLS is Azure's chat completion endpoint URL
resp = await client.post(url=OPENAI_URLS[model], json=payload, headers=headers)
prompt_tokens=resp.json()["usage"]["prompt_tokens"]

Results in:

Traceback (most recent call last):
...
    prompt_tokens=resp.json()["usage"]["prompt_tokens"],
                  ~~~~~~~~~~~^^^^^^^^^
KeyError: 'usage'

What’s weird is that this is happening exclusively with 3.5. I am also using GPT4 and they are working as expected.

Edit: Add call to model

_j · February 8, 2024, 10:31pm

Quite odd.

Your traceback only lets us guess at the method for accessing the API endpoint, as you haven’t shared code.

You are not streaming responses then under any circumstance?

(No usage in stream.)

You are not working with the openai module response or a pydantic return? Who’s .json() method?

(PydanticDeprecatedSince20: The json method is deprecated; use model_dump_json instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at Migration Guide - Pydantic)

Getting usage out of openai module’s “await client.chat.completions.with_raw_response.create()”:

print(apiresponse.parse().usage.prompt_tokens)

I suggest:

logger of entire response object or requests content to find out if the API is actually omitting usage.
fallback:
prompt_tokens=resp.json()["usage"]["prompt_tokens"] if 'usage' in resp.json() else -1

sector373 · February 9, 2024, 2:06pm

Thanks for the feedback. I’m using Azure’s chat completion endpoint (code added to original post), and calling .json() on the response object. I’m not streaming.

Output of resp.json():

{'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'stop', 'index': 0, 'message': {'content': 'Two plus two equals four.', 'role': 'assistant'}}], 'created': 0, 'id': '', 'model': '', 'object': '', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]}

_j · February 9, 2024, 2:37pm

You can affect things a bit more there: You can re-deploy the AI model, even picking a different datacenter.

OpenAI can’t fix bugs with MS products…

jtrout · February 9, 2024, 2:43pm

We started seeing the same thing this morning - at first it was intermitent but now I’m seeing usage always returned as null. Seems to affect azure gpt-4-32k, however gpt-4-turbo on openai direct is returning usage.

I ended up deploying a patch to look for that and skip collecting usage.

Not using streaming API

sector373 · February 9, 2024, 2:52pm

Hmm, that’s strange. 4_32k is still collecting usage for me. Kind of a bummer because we need to collect usage to track costs.

Thanks for the assistance. I’ll forward this issue to Microsoft.

_j · February 9, 2024, 3:26pm

With chat completions, you are in control of the input and output, and can measure what is sent and received yourself.

tiktoken is a token-counting library.

Then you can have the higher satisfaction of a stream response.

Topic		Replies	Views
Chat completion API doesn't return usage data Bugs	5	49	October 2, 2024
GPT-3.5-turbo stalling out and not responding API	1	1372	November 1, 2023
Consistent API errors in gpt-4o API	1	871	July 25, 2024
Python APi: chat.completions.create returns None Bugs	9	1073	November 26, 2024
API access to GPT-4 no longer working API	9	6686	April 29, 2024

KeyError: "usage" for gpt-3.5-turbo-16k

Related topics