Best practices for handling long queue times with OpenAI's responses API

Hi everyone,

When using OpenAI’s responses API for deep research models (like o3-deepresearch, o3-pro), long queue times can be an issue. I’ve seen our queues exceeding 45 minutes.

What’s the recommended approach if a job seems stuck in a queue?
If a job hasn’t completed after 45 minutes, is it advisable to abandon it and resubmit an identical job (hoping for a shorter queue), or is there a better best practice for handling this?

I realize that abandoning a job doesn’t cancel it, so there’s a risk of being charged for multiple completions if both eventually finish. Is there any API-level method for canceling a pending/running job to prevent this?

What strategies do others use to manage long queue times other than simply waiting? Any advice on avoiding duplicate charges would be much appreciated.

Thanks for your insights!

There are various options, the simplest being background mode or batch.
For a more sophisticated approach you can use webhooks.

A background request be streamed, polled or receive a webhook notification when done. It can also be cancelled.

Also related:

1 Like

On that doc page about background requests, it is also said that terminating a connection will cancel a synchronous request. But doing so will charge you anyway. I was hoping I would not be charged for at least the output tokens.

My question is: are we charged for cancelled background requests ? I suppose at least for the input tokens, what about the output tokens ?

1 Like