APITimeoutError vs. user error

For some reason, every implementation I have ever done seems to suffer for the same problem and it doesn’t seem common on the internet so, to paraphrase a billionaire, the problem is likely me.

Observation:

I see get seemingly clustered APITimeoutErrors. Tens of thousands of queries can be fine, then five will fail. It seems fairly reliable, in the sense that when I repeat the queries they are much more likely (although not 100%) to fail with the same error. There’s no commonality between the queries and they’re not particularly complicated. They use structured outputs and specify a json schema, but otherwise they’re pretty basic. I’ve definitely observed it with gpt-4.1 and probably also gpt-4.1-mini.

Setup:

I use the Python openai client library to access the OpenAI chat completions API via the AsyncOpenAI client object. I am doing this from 3 implementations hosted across serverless GCP offerings (e.g. Cloud Run) and kubernetes and always observe the same phenomenon.

Anyone have a suggestion?

Hmm could you try moderations endpoint on that and see if it violates anything?
Moderations endpoint is free to use and should be used anyways.

const response = await openai.moderations.create({
  model: "omni-moderation-latest",
  input: userInput,
});

if (response.results[0].flagged) {
  // do whatever you must to get notified + userInput 
}

Do negative moderation results give APITimeoutErrors?

I think it was something infrastructure-related at the moment I was writing the post. I tried the same batch of 100,000 prompts an hour later and it completed without errors. It’s just weird to me that no one else seems to experience this, because I see it weekly.

I log the requests that fail as a json blob. That blob (including response format) is 4~10k characters, so not very complicated in the grand scheme of things. My connection timeout is 3 minutes (connect timeout is 5 seconds).

1 Like

You can ponder: has the AI gone into a loop of non-stop production on that type of input?

Have you capped max_output_tokens to just above the maximum that any prompt could ever produce?

Will “stream”:true, with a constant stream of network tokens, avoid the error?

1 Like

Hmm sounds like a lot of parallel stuff. Maybe you ran into a rate limit and you have a faulty retry logic?

I don’t think so . . . Looking more carefully at the call stack (didn’t notice they used raise APITimeout from err initially), they appear to be httpx.ReadTimeouts. 3 minutes for a ReadTimeout already seems pretty high, but maybe I need to bump it up.

Based on the underlying httpcore code, I think that httpx will throw a ReadTimeout if I send a request and it takes my code more than 3 minutes to get back to the async task, regardless whether that task yielded a result. It could be that I’m doing some synchronous processing when receiving results that (+ large number of async requests) is causing the timeouts because I just slowly get behind and can’t read from the async task in time.