What is the best practice for keeping containers alive?

Hi community

I am using OpenAI’s Agent SDK and creating a code_interpreter tool or file_search bound to a container like this:

# Create a container
container = await openai_client.async_openai.containers.create(**container_params)

tool_config = {
    "type": "code_interpreter",
    "container": container.id,
}
tool = CodeInterpreterTool(tool_config=tool_config)

agent = Agent(
    name=f"Agent-{assistant.name}",
    instructions=instructions,
    model=openai_model,
    tools=[tool],
    model_settings=ModelSettings(
        temperature=assistant.temperature,
        top_p=assistant.top_p,
        tool_choice="auto",
    ),
)
await agent.run()  # Perform agent operations

To prevent the container from expiring (since we want to reuse the previous_response_id in subsequent calls), we attempt to keep it alive by periodically retrieving it:

# Every 5 minutes, ping the container to keep it alive
await openai_client.async_openai.containers.retrieve(container.id)

However, it appears that even with periodic retrieve() calls, the container may still expire after a certain period (in some observed cases, possibly around 24 hours).

  • Is there a recommended way to extend the lifetime of a container?
  • Is periodic retrieve() sufficient to prevent timeout?
  • Any best practices for long-lived agents or stateful workflows?

Thank you!

It would seem the most reliable way would be to use a separate API call and cheap model that can still write code with top_p:0.0, referencing the container ID in the tool parameter, and ask for a very simple zero-impact script to be sent, such as “Give my your Python notebook environment’s system time by immediately sending a script to python tool”, every 10 minutes or so.

At “$0.03” per container as the method for billing, this should be cheaper than losing the session container, besides the data and work of a chat being lost and unable to continue. Probably undesired pattern by OpenAI, but oh, well.

Under 500 “nano” tokens, $0.00005 (plus $0.03 to show you)

Here’s an idea you can automate, prompting the exact code, and use structured output to communicate a failure to maintain notebook state (although you should expect an API error for no container ID).

Thanks in advance for any insights you have shared!

But I am not sure if these can also make the container live longer though. From the document I understand the disclaimer that containers are ephemeral, but I think what we need from OpenAI is more clarity around the actual lifecycle limitations of a container. https://platform.openai.com/docs/guides/tools-code-interpreter

Expiration
We highly recommend you treat containers as ephemeral and store all data related to the use of this tool on your own systems.
A container expires if it is not used for 20 minutes. When this happens, using the container in v1/responses will fail. You’ll still be able to see a snapshot of the container’s metadata at its expiry, but all data associated with the container will be discarded from our systems and not recoverable.
Any container operation—like retrieving the container, or adding or deleting files—will automatically refresh the container’s last_active_at time.
You can’t move a container from an expired state back to active. Instead, you must create a new container and re-upload files. State in memory (e.g., Python objects) will be lost.

So that is why I’m calling retrieve() every 5 minutes to keep the container active and prevent the 20-minute idle timeout. This seems to work in the short term. However, I’ve observed that even with regular activity, containers still seem to expire after a longer duration — possibly around 24 hours. Lets why I need some guidances.