APITimeoutError vs. user error

peter.adelman · January 24, 2026, 12:58pm

For some reason, every implementation I have ever done seems to suffer for the same problem and it doesn’t seem common on the internet so, to paraphrase a billionaire, the problem is likely me.

Observation:

I see get seemingly clustered APITimeoutErrors. Tens of thousands of queries can be fine, then five will fail. It seems fairly reliable, in the sense that when I repeat the queries they are much more likely (although not 100%) to fail with the same error. There’s no commonality between the queries and they’re not particularly complicated. They use structured outputs and specify a json schema, but otherwise they’re pretty basic. I’ve definitely observed it with gpt-4.1 and probably also gpt-4.1-mini.

Setup:

I use the Python openai client library to access the OpenAI chat completions API via the AsyncOpenAI client object. I am doing this from 3 implementations hosted across serverless GCP offerings (e.g. Cloud Run) and kubernetes and always observe the same phenomenon.

Anyone have a suggestion?

jochenschultz · January 25, 2026, 2:28am

Hmm could you try moderations endpoint on that and see if it violates anything?
Moderations endpoint is free to use and should be used anyways.

const response = await openai.moderations.create({
  model: "omni-moderation-latest",
  input: userInput,
});

if (response.results[0].flagged) {
  // do whatever you must to get notified + userInput 
}

peter.adelman · January 27, 2026, 5:56pm

Do negative moderation results give APITimeoutErrors?

I think it was something infrastructure-related at the moment I was writing the post. I tried the same batch of 100,000 prompts an hour later and it completed without errors. It’s just weird to me that no one else seems to experience this, because I see it weekly.

I log the requests that fail as a json blob. That blob (including response format) is 4~10k characters, so not very complicated in the grand scheme of things. My connection timeout is 3 minutes (connect timeout is 5 seconds).

_j · January 27, 2026, 6:06pm

You can ponder: has the AI gone into a loop of non-stop production on that type of input?

Have you capped max_output_tokens to just above the maximum that any prompt could ever produce?

Will “stream”:true, with a constant stream of network tokens, avoid the error?

jochenschultz · January 27, 2026, 8:00pm

Hmm sounds like a lot of parallel stuff. Maybe you ran into a rate limit and you have a faulty retry logic?

peter.adelman · January 28, 2026, 4:05pm

I don’t think so . . . Looking more carefully at the call stack (didn’t notice they used raise APITimeout from err initially), they appear to be httpx.ReadTimeouts. 3 minutes for a ReadTimeout already seems pretty high, but maybe I need to bump it up.

Based on the underlying httpcore code, I think that httpx will throw a ReadTimeout if I send a request and it takes my code more than 3 minutes to get back to the async task, regardless whether that task yielded a result. It could be that I’m doing some synchronous processing when receiving results that (+ large number of async requests) is causing the timeouts because I just slowly get behind and can’t read from the async task in time.

Topic		Replies	Views
Frequently getting API timeout error What am I doing wrong? API	9	28051	December 8, 2025
httpx.TimeoutException with o3 deepresearch and o3 pro models API deep-research , o3-pro	2	750	July 12, 2025
Randomly Occurring OpenAI API Timeouts API	16	10863	December 16, 2023
Frequent API timeout errors recently API	39	50361	December 12, 2023
Constant API Timeout Errors Bugs api	0	969	December 30, 2023

APITimeoutError vs. user error

Related topics