For the past few hours, all of my background requests have been taking several minutes to return a response. I checked the API Status page and everything looks normal. I also contacted OpenAI Support, but the only explanation I got was “variable latency”, which hasn’t been very helpful.
At first I assumed it might be related to queueing, so I switched the project to the Priority tier, but it didn’t improve anything. With time passing if anything, it seems even worse…
Non-background requests are working fine, but my entire pipeline relies on sequential backend processing, which is why I use background mode + webhooks.
Is anyone else experiencing this issue with background requests right now?
When you say background requests, do you mean the lower cost “Batch” API calls? Or regular API calls? Do you have a code snippet that shows the setup and execution of the API call?
On the Responses API, you don’t have to keep a connection open and wait for the output or the stream, or get nothing if the connection closes. You can use “background” parameter to immediately close the connection after it is ingested.
To obtain the response, you can either poll the status of the response ID, or can subscribe to a webhook to know when to check.
Should OpenAI be charging the same and the parameter gives you terrible service? Absolutely not.
One should compare the latency of the model vs a normal API request, to see if the observation is true and repeatable, because gpt-5 simply thinks for way longer than anyone could expect.
Hey StroeAndre, Can you please confirm if you are still facing latency issue with background requests? If yes, can you please share the request id with us. We are more than happy to help. Thank you!
Currently, the time to first token you receive from a background response is higher than what you receive from a synchronous one. We are working to reduce this latency gap in the coming weeks.
Hey @OpenAI_Support, Can you please confirm if you are still working to reduce this latency gap. If yes, can you please share the completion date, or if not, share the quality degradation that one may face that is recognized but without mitigation currently?
Then, you have “streaming” of the background response. If attempted, this was terminating the stream connection within five minutes. The same documentation page used to have “resuming the background streaming is coming”, but that future promise of a feature that could make this useful is now gone.
Thanks for flagging this. The latency gap for background responses is still a known tradeoff, and we don’t have a completion date to share right now. I’m not aware of a model quality degradation specific to background mode; the main difference is request lifecycle and time-to-first-token, not output quality.
For long-running tasks, the most reliable pattern is still to use background mode with polling, or background streaming with reconnect via `starting_after` if your client can support it. Stream disconnects around ~5 minutes are typically caused by idle client/proxy timeouts rather than a hard response limit, so if you’re seeing that consistently, polling is the safer option today.