We have been experiencing a complex issue when calling the chat-completion endpoint via the python SDK.:
As of recently, we sometimes receive None when the response should be of type ChatCompletion
The issue only occurs only sometimes on exactly the same input, so is only partially reproducible
We call the API concurrently, on the problematic case e.g. today with 100 concurrent requests
models: gpt-4-turbo, gpt-4o
We are very far from our rate limits, at least in theory. Not sure how granular the quantization is, but in that case, we should receive openai.RateLimitError
Interestingly, I have never been able to reproduce locally, but the issue has been occuring frequently across our GCP-deployed environments (fastAPI + GCP CloudRun)
With some simplifications our code looks like this:
import asyncio
from openai import AsyncOpenAI
async def generate_content():
client = AsyncOpenAI()
response = await client.chat.completions.create(
model='gpt-4o',
messages=...,
stream=False
)
return response.choices[0].message.content
contents = await asyncio.gather(*(generate_content() for _ in range(100))
This sometimes fails, according to datadog traces and GCP CR logs with AttributeError: 'NoneType' object has no attribute 'choices'
Any ideas on potential issues or debugging directions?
This makes me think itās related to asyncio maybe? Could that be sending back āNoneā after an unseen error code from OpenAI? Or are you getting an actual OAI error code somewhere?
I am actually calling response.choices[0].message.content inside the function, and this fails, according to datadog traces with AttributeError: 'NoneType' object has no attribute 'choices'
As I cannot reproduce locally, I have to rely on DD (unless I deploy some additional monitoring / logging)
Everything is super simple if you just use langchain. Thereās not much of a learning curve. You can easily copy someone elses code and start using it without understanding much. Hereās my langchain code in my own chatbot, for reference:
Thanks - we are moving to langchain, langgraph to be precise, but that doesnāt fix the error here and now for our legacy code and some existing clients.
Gotcha. With 100 concurrent calls, and only intermittent failures, Iād just put in a ton of logging, and have the ability to catch that exact case where it fails, and be able to log exactly what was sent on that call, and what the HTTP response code was that you got back, and the raw stream you got back.
Worst case scenario you might have to make a clone of the OpenAI Python code, and call it, so that you can decorate it with as much diagnostic logs as you need to.
I am also experiencing this with Batch Processing, using GPT-4o for some topic modeling. I have specifically instructed in the prompt to not return None or NoneType, and it keeps doing it, but then if i do a synchronous API call for all of them with the same prompt, it works.
I added some logging, and our issue is coming from the API returning ā500ā HTTP status code with a lot of concurrent calls that are however still below our rate limit. We havenāt investigated further because the priority of this is not high currently and we may switch to other providers, but we may revisit.