KeyError: "usage" for gpt-3.5-turbo-16k

I’ve recently been getting KeyErrors when using gpt-3.5-turbo-16k where the “usage” key is completely missing from the response object. What’s weird is that this happens inconsistently, without any changes to the code:

headers = {
        "Authorization": await get_azure_token(),
        "OCP-Apim-Subscription-key": OPENAI_KEYS[model],

    # Payload contains everything we're passing to OpenAI and must
    #  conform to their API.

payload = {
        "messages": [message.model_dump() for message in messages],
        "temperature": kwargs.get("temperature", 1),
async with httpx.AsyncClient(
        verify=False, follow_redirects=True, timeout=360
    ) as client:
# OPENAI_URLS is Azure's chat completion endpoint URL
resp = await[model], json=payload, headers=headers)

Results in:

Traceback (most recent call last):
KeyError: 'usage'

What’s weird is that this is happening exclusively with 3.5. I am also using GPT4 and they are working as expected.

Edit: Add call to model

Quite odd.

Your traceback only lets us guess at the method for accessing the API endpoint, as you haven’t shared code.

  • You are not streaming responses then under any circumstance?

(No usage in stream.)

  • You are not working with the openai module response or a pydantic return? Who’s .json() method?

(PydanticDeprecatedSince20: The json method is deprecated; use model_dump_json instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at Migration Guide - Pydantic)

Getting usage out of openai module’s “await”:


I suggest:

  1. logger of entire response object or requests content to find out if the API is actually omitting usage.

  2. fallback:
    prompt_tokens=resp.json()["usage"]["prompt_tokens"] if 'usage' in resp.json() else -1

Thanks for the feedback. I’m using Azure’s chat completion endpoint (code added to original post), and calling .json() on the response object. I’m not streaming.

Output of resp.json():

{'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'stop', 'index': 0, 'message': {'content': 'Two plus two equals four.', 'role': 'assistant'}}], 'created': 0, 'id': '', 'model': '', 'object': '', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]}

You can affect things a bit more there: You can re-deploy the AI model, even picking a different datacenter.

OpenAI can’t fix bugs with MS products…

1 Like

We started seeing the same thing this morning - at first it was intermitent but now I’m seeing usage always returned as null. Seems to affect azure gpt-4-32k, however gpt-4-turbo on openai direct is returning usage.

I ended up deploying a patch to look for that and skip collecting usage.

Not using streaming API

1 Like

Hmm, that’s strange. 4_32k is still collecting usage for me. Kind of a bummer because we need to collect usage to track costs.

Thanks for the assistance. I’ll forward this issue to Microsoft.

With chat completions, you are in control of the input and output, and can measure what is sent and received yourself.

tiktoken is a token-counting library.

Then you can have the higher satisfaction of a stream response.

1 Like