Responses API: create run -> fetch result?

The Assistants API, which will be deprecated in a year, has functionality to create a run that will happen asynchronously and then query for the run result using the run ID.

Is this possible with the Responses API? As far as I could tell, it was not immediately obvious how to do that if possible.

Many of us are running in Serverless environments (Vercel / Lambda / Cloud Run) and:

  1. the cost associated with waiting on a response would be orders of magnitude lower if we didn’t have to do that,
  2. When re-deploying code, providers often have a required short timeout for the container to stop, pending requests to OpenAI would be lost without additional complex architecture (messaging or db state management)

(And that’s putting aside the point that for hobbyists some providers don’t even allow for a long enough time out)

Responses is, like Chat Completions, inherently a keep-alive open connection service, whether you are waiting for a final http result or have enabled a stream of events.

So it is not possible.

The statefulness is by being able to send a past response ID into a followup request, not by an independent chat that actions can be kicked off against.


There is an API for, by response_id, getting the object and output of a past completed response, but that response ID can only be known and content obtained by the successful initial responses call.

Closing the connection on a stream will stop the generation on Chat Completions, likely the same here. A thought of getting the first stream event with response ID and closing the connection likely would not work. There is a field “Details about why the response is incomplete.” but without more documentation, that could be returning “max tokens too small”, not a status.

There is talk of bringing up feature parity to Assistants, but that might not involve sending events or triggers alone.

Thanks for the response! You can see why this would be a challenge for developers, correct?

Would you recommend using the Assistants API for now despite it being slated for deprecation?

Assistants has drawbacks, especially in contrast to conversation length management you can build yourself or tailored knowledge tools you can infuse the AI with on Responses, when using your own list of input messages every time instead of state persistence.

However, it seems like the independent quick calls of Assistants, adding delays and needing your own orchestration, still have an advantage for you.

The main impediment in many hosting environments is hitting a platform timeout while waiting for a long reasoning response to be returned. Responses’ streaming now including more events may serve as more of a keep-alive when you are force-closed after 60 seconds if from inactivity.