Python APi: chat.completions.create returns None

We have been experiencing a complex issue when calling the chat-completion endpoint via the python SDK.:

  • As of recently, we sometimes receive None when the response should be of type ChatCompletion
  • The issue only occurs only sometimes on exactly the same input, so is only partially reproducible
  • We call the API concurrently, on the problematic case e.g. today with 100 concurrent requests
  • models: gpt-4-turbo, gpt-4o
  • We are very far from our rate limits, at least in theory. Not sure how granular the quantization is, but in that case, we should receive openai.RateLimitError
  • Interestingly, I have never been able to reproduce locally, but the issue has been occuring frequently across our GCP-deployed environments (fastAPI + GCP CloudRun)

With some simplifications our code looks like this:

import asyncio

from openai import AsyncOpenAI


async def generate_content(): 
     client = AsyncOpenAI()
    response = await client.chat.completions.create(
        model='gpt-4o', 
        messages=..., 
        stream=False
    )
    return response.choices[0].message.content


contents = await asyncio.gather(*(generate_content() for _ in range(100))

This sometimes fails, according to datadog traces and GCP CR logs with AttributeError: 'NoneType' object has no attribute 'choices'

Any ideas on potential issues or debugging directions?

Many thanks!

1 Like

Welcome to the dev forum!

This makes me think itā€™s related to asyncio maybe? Could that be sending back ā€œNoneā€ after an unseen error code from OpenAI? Or are you getting an actual OAI error code somewhere?

2 Likes

Thanks! And thanks, but very likely no:

  • I am actually calling response.choices[0].message.content inside the function, and this fails, according to datadog traces with AttributeError: 'NoneType' object has no attribute 'choices'
  • As I cannot reproduce locally, I have to rely on DD (unless I deploy some additional monitoring / logging)
  • No error code from OpenAI that I am aware of
1 Like

Everything is super simple if you just use langchain. Thereā€™s not much of a learning curve. You can easily copy someone elses code and start using it without understanding much. Hereā€™s my langchain code in my own chatbot, for reference:

1 Like

Thanks - we are moving to langchain, langgraph to be precise, but that doesnā€™t fix the error here and now for our legacy code and some existing clients.

Gotcha. With 100 concurrent calls, and only intermittent failures, Iā€™d just put in a ton of logging, and have the ability to catch that exact case where it fails, and be able to log exactly what was sent on that call, and what the HTTP response code was that you got back, and the raw stream you got back.

Worst case scenario you might have to make a clone of the OpenAI Python code, and call it, so that you can decorate it with as much diagnostic logs as you need to.

1 Like

In your deployment environment, are you able to set asyncio in debug mode to get more verbose messages in your logs?

I am also experiencing this with Batch Processing, using GPT-4o for some topic modeling. I have specifically instructed in the prompt to not return None or NoneType, and it keeps doing it, but then if i do a synchronous API call for all of them with the same prompt, it works.

I added some logging, and our issue is coming from the API returning ā€œ500ā€ HTTP status code with a lot of concurrent calls that are however still below our rate limit. We havenā€™t investigated further because the priority of this is not high currently and we may switch to other providers, but we may revisit.