For some reason, every implementation I have ever done seems to suffer for the same problem and it doesn’t seem common on the internet so, to paraphrase a billionaire, the problem is likely me.
Observation:
I see get seemingly clustered APITimeoutErrors. Tens of thousands of queries can be fine, then five will fail. It seems fairly reliable, in the sense that when I repeat the queries they are much more likely (although not 100%) to fail with the same error. There’s no commonality between the queries and they’re not particularly complicated. They use structured outputs and specify a json schema, but otherwise they’re pretty basic. I’ve definitely observed it with gpt-4.1 and probably also gpt-4.1-mini.
Setup:
I use the Python openai client library to access the OpenAI chat completions API via the AsyncOpenAI client object. I am doing this from 3 implementations hosted across serverless GCP offerings (e.g. Cloud Run) and kubernetes and always observe the same phenomenon.
Hmm could you try moderations endpoint on that and see if it violates anything?
Moderations endpoint is free to use and should be used anyways.
const response = await openai.moderations.create({
model: "omni-moderation-latest",
input: userInput,
});
if (response.results[0].flagged) {
// do whatever you must to get notified + userInput
}
Do negative moderation results give APITimeoutErrors?
I think it was something infrastructure-related at the moment I was writing the post. I tried the same batch of 100,000 prompts an hour later and it completed without errors. It’s just weird to me that no one else seems to experience this, because I see it weekly.
I log the requests that fail as a json blob. That blob (including response format) is 4~10k characters, so not very complicated in the grand scheme of things. My connection timeout is 3 minutes (connect timeout is 5 seconds).
I don’t think so . . . Looking more carefully at the call stack (didn’t notice they used raise APITimeout from err initially), they appear to be httpx.ReadTimeouts. 3 minutes for a ReadTimeout already seems pretty high, but maybe I need to bump it up.
Based on the underlying httpcore code, I think that httpx will throw a ReadTimeout if I send a request and it takes my code more than 3 minutes to get back to the async task, regardless whether that task yielded a result. It could be that I’m doing some synchronous processing when receiving results that (+ large number of async requests) is causing the timeouts because I just slowly get behind and can’t read from the async task in time.