I’ve recently been getting KeyErrors when using gpt-3.5-turbo-16k where the “usage” key is completely missing from the response object. What’s weird is that this happens inconsistently, without any changes to the code:
headers = {
"Authorization": await get_azure_token(),
"OCP-Apim-Subscription-key": OPENAI_KEYS[model],
}
# Payload contains everything we're passing to OpenAI and must
# conform to their API.
payload = {
"messages": [message.model_dump() for message in messages],
"temperature": kwargs.get("temperature", 1),
}
async with httpx.AsyncClient(
verify=False, follow_redirects=True, timeout=360
) as client:
# OPENAI_URLS is Azure's chat completion endpoint URL
resp = await client.post(url=OPENAI_URLS[model], json=payload, headers=headers)
prompt_tokens=resp.json()["usage"]["prompt_tokens"]
Your traceback only lets us guess at the method for accessing the API endpoint, as you haven’t shared code.
You are not streaming responses then under any circumstance?
(No usage in stream.)
You are not working with the openai module response or a pydantic return? Who’s .json() method?
(PydanticDeprecatedSince20: The json method is deprecated; use model_dump_json instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at Migration Guide - Pydantic)
Getting usage out of openai module’s “await client.chat.completions.with_raw_response.create()”:
print(apiresponse.parse().usage.prompt_tokens)
I suggest:
logger of entire response object or requests content to find out if the API is actually omitting usage.
fallback: prompt_tokens=resp.json()["usage"]["prompt_tokens"] if 'usage' in resp.json() else -1
Thanks for the feedback. I’m using Azure’s chat completion endpoint (code added to original post), and calling .json() on the response object. I’m not streaming.
We started seeing the same thing this morning - at first it was intermitent but now I’m seeing usage always returned as null. Seems to affect azure gpt-4-32k, however gpt-4-turbo on openai direct is returning usage.
I ended up deploying a patch to look for that and skip collecting usage.