Python APi: chat.completions.create returns None

richard41 · June 21, 2024, 6:23pm

We have been experiencing a complex issue when calling the chat-completion endpoint via the python SDK.:

As of recently, we sometimes receive None when the response should be of type ChatCompletion
The issue only occurs only sometimes on exactly the same input, so is only partially reproducible
We call the API concurrently, on the problematic case e.g. today with 100 concurrent requests
models: gpt-4-turbo, gpt-4o
We are very far from our rate limits, at least in theory. Not sure how granular the quantization is, but in that case, we should receive openai.RateLimitError
Interestingly, I have never been able to reproduce locally, but the issue has been occuring frequently across our GCP-deployed environments (fastAPI + GCP CloudRun)

With some simplifications our code looks like this:

import asyncio

from openai import AsyncOpenAI


async def generate_content(): 
     client = AsyncOpenAI()
    response = await client.chat.completions.create(
        model='gpt-4o', 
        messages=..., 
        stream=False
    )
    return response.choices[0].message.content


contents = await asyncio.gather(*(generate_content() for _ in range(100))

This sometimes fails, according to datadog traces and GCP CR logs with AttributeError: 'NoneType' object has no attribute 'choices'

Any ideas on potential issues or debugging directions?

Many thanks!

PaulBellow · June 21, 2024, 8:08pm

Welcome to the dev forum!

This makes me think it’s related to asyncio maybe? Could that be sending back “None” after an unseen error code from OpenAI? Or are you getting an actual OAI error code somewhere?

richard41 · June 21, 2024, 8:59pm

Thanks! And thanks, but very likely no:

I am actually calling response.choices[0].message.content inside the function, and this fails, according to datadog traces with AttributeError: 'NoneType' object has no attribute 'choices'
As I cannot reproduce locally, I have to rely on DD (unless I deploy some additional monitoring / logging)
No error code from OpenAI that I am aware of

wclayf · June 22, 2024, 4:27am

Everything is super simple if you just use langchain. There’s not much of a learning curve. You can easily copy someone elses code and start using it without understanding much. Here’s my langchain code in my own chatbot, for reference:

github.com

Clay-Ferguson/QuantaAgent/blob/master/agent/app_ai.py

"""Makes a query to AI API and writes the response to a file."""

import argparse
import os
from typing import Dict, List
from langchain.schema import HumanMessage, AIMessage, BaseMessage, SystemMessage
from langchain.chat_models.base import BaseChatModel
from langgraph.prebuilt import chat_agent_executor


from agent.app_config import AppConfig
from agent.models import TextBlock
from agent.utils import RefactorMode, Utils
from agent.tools.refactoring_tools import (
    UpdateBlockTool,
    CreateFileTool,
    UpdateFileTool,
    update_block,
    create_file,
    update_file,

This file has been truncated. show original

richard41 · June 22, 2024, 5:11am

Thanks - we are moving to langchain, langgraph to be precise, but that doesn’t fix the error here and now for our legacy code and some existing clients.

wclayf · June 22, 2024, 6:15am

Gotcha. With 100 concurrent calls, and only intermittent failures, I’d just put in a ton of logging, and have the ability to catch that exact case where it fails, and be able to log exactly what was sent on that call, and what the HTTP response code was that you got back, and the raw stream you got back.

Worst case scenario you might have to make a clone of the OpenAI Python code, and call it, so that you can decorate it with as much diagnostic logs as you need to.

maurice3 · June 22, 2024, 9:20am

In your deployment environment, are you able to set asyncio in debug mode to get more verbose messages in your logs?

michellemg91 · September 19, 2024, 7:33pm

I am also experiencing this with Batch Processing, using GPT-4o for some topic modeling. I have specifically instructed in the prompt to not return None or NoneType, and it keeps doing it, but then if i do a synchronous API call for all of them with the same prompt, it works.

richard41 · September 20, 2024, 7:05am

I added some logging, and our issue is coming from the API returning “500” HTTP status code with a lot of concurrent calls that are however still below our rate limit. We haven’t investigated further because the priority of this is not high currently and we may switch to other providers, but we may revisit.

lintaixia · November 26, 2024, 8:10am

I’m using AsyncOpenAI and having this error as well.
The response spits out NoneType object.

The sad part is, this seem to happen time to time. My client reported this so I wanted to reproduce the same error but mine worked just fine.

API key was fine and we didn’t go above rate limit but API request kept giving out NoneType to some of the requests.

If anyone knows anything about this error, please help.

Omri_Ben_Shoham · April 23, 2025, 2:41pm

Also happens here AWS + FastAPI deployment

Topic		Replies	Views
Completion Endpoint Randomly Freezes API	10	7629	October 26, 2023
Error: GPT3 responding with finish_reason == None API	5	1833	February 28, 2023
Bizarre issue preventing response from gpt-4o-mini (‘The model produces invalid content’) API bug , api , error , help , gpt-4o-mini	9	4602	July 30, 2025
Chat Completion API extremely slow and hanging API	7	5618	December 4, 2023
Intermittent Latency Spikes with Chat Completion API (GPT-4) in FastAPI Application API	0	256	October 28, 2024

Python APi: chat.completions.create returns None

Related topics