Responses API: Excessive charges due to repetitive container creation when using Code Interpreter

We’re seeing a change in behavior in how the Responses API handles Code Interpreter tool calls compared to the Assistants API, which can result in excessive charges for Code Interpreter sessions. We’ve reported the issue to OpenAI, but wanted to pass along our findings in case others are seeing similar usage charges.


When using "container": {"type": "auto"}, the API will create a new container before the model makes a CI tool call. If the model doesn’t make a CI tool call while generating its response, the API provides no information about the newly created container. Because there’s no information on the container created during the first response, the API will create a second container while generating a follow-up response, no matter what context management solution is used (manual context management, previous_response_id, or the Conversations API).

In short, as long as messages in a conversation do not trigger a Code Interpreter tool call from the model, we are charged an extra $0.30 per response for the container the API is creating without providing any information in the Response object. When the model finally makes a CI tool call, the code_interpreter_call item includes the container’s ID and can be reused across responses.

This behavior is a departure from the Assistants API, where Code Interpreter sessions were only created and charged if the assistant made a CI tool call. The CI tool behavior does not mirror other built-in tools, such as File Search, which are charged when the model makes a tool call (e.g. $2.50 / 1k File Search tool calls).

The relevant Docs section, included below, isn’t super clear as to whether we should expect a container to be created when the model makes the Code Interpreter tool call, or whenever we submit a Create Response request. We would prefer that the behavior be the former, which would mirror the Assistants API. At minimum, there should be a way to grab the container ID that the API created when generating a response regardless of whether the model makes a CI call.

The Code Interpreter tool requires a container object. A container is a fully sandboxed virtual machine that the model can run Python code in. This container can contain files that you upload, or that it generates.

There are two ways to create containers:

  1. Auto mode: as seen in the example above, you can do this by passing the "container": { "type": "auto", "file_ids": ["file-1", "file-2"] } property in the tool configuration while creating a new Response object. This automatically creates a new container, or reuses an active container that was used by a previous code_interpreter_call item in the model’s context. Look for the code_interpreter_call item in the output of this API request to find the container_idthat was generated or used.

How to reproduce

from openai import OpenAI
import time

client = OpenAI()

def count_all_containers(client):
    total_count = 0
    after = None

    while True:
        page = client.containers.list(after=after, limit=100, order="asc")
        containers = page.data
        total_count += len(containers)

        if not getattr(page, "has_more", False):
            break

        after = containers[-1].id

    return total_count

def manual_context(client):
  intial_containers = count_all_containers(client)
  print(f"Total number of containers before first request: {intial_containers}")

  conversation_history = [
      {"role": "user", "content": "Write a one-sentence bedtime story about a unicorn."}
  ]

  first_response = client.responses.create(
      model="gpt-4.1",
      input=conversation_history,
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      store=True
  )

  conversation_history.extend(first_response.output)

  conversation_history.append(
      {"role": "user", "content": "Write me another one."}
  )

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_first_resp = count_all_containers(client)
  print(f"Total number of containers before second request: {containers_after_first_resp}")

  second_response = client.responses.create(
      model="gpt-4.1",
      input=conversation_history,
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      store=True
  )

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_second_resp = count_all_containers(client)
  print(f"Total number of containers after second request: {containers_after_second_resp}")
  print(f"Containers created: {containers_after_second_resp - intial_containers}")

def previous_response_id(client):
  intial_containers = count_all_containers(client)
  print(f"Total number of containers before first request: {intial_containers}")

  first_response = client.responses.create(
      model="gpt-4.1",
      input="Write a one-sentence bedtime story about a unicorn.",
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      store=True
  )

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_first_resp = count_all_containers(client)
  print(f"Total number of containers before second request: {containers_after_first_resp}")

  second_response = client.responses.create(
      model="gpt-4.1",
      input="Write me another one.",
      previous_response_id=first_response.id,
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      store=True
  )

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_second_resp = count_all_containers(client)
  print(f"Total number of containers after second request: {containers_after_second_resp}")
  print(f"Containers created: {containers_after_second_resp - intial_containers}")

def conversation_api(client):
  intial_containers = count_all_containers(client)
  print(f"Total number of containers before first request: {intial_containers}")

  # Create a new conversation to store the responses
  conversation = client.conversations.create()
  
  first_response = client.responses.create(
      model="gpt-4.1",
      input="Write a one-sentence bedtime story about a unicorn.",
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      conversation=conversation.id
  )

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_first_resp = count_all_containers(client)
  print(f"Total number of containers before second request: {containers_after_first_resp}")

  second_response = client.responses.create(
      model="gpt-4.1",
      input="Write me another one.",
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      conversation=conversation.id
  )

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_second_resp = count_all_containers(client)
  print(f"Total number of containers after second request: {containers_after_second_resp}")
  print(f"Containers created: {containers_after_second_resp - intial_containers}")

print('='*50)
print('MANUAL CONTEXT:\n')
manual_context(client)
print('='*50)

print('\n'+'='*50)
print('PREVIOUS RESPONSE ID:\n')
previous_response_id(client)
print('='*50)

print('\n'+'='*50)
print('CONVERSATION API:\n')
conversation_api(client)
print('='*50)

Which prints out:

==================================================
MANUAL CONTEXT:

Total number of containers before first request: 25
Total number of containers before second request: 26
Total number of containers after second request: 27
Containers created: 2
==================================================

==================================================
PREVIOUS RESPONSE ID:

Total number of containers before first request: 27
Total number of containers before second request: 28
Total number of containers after second request: 29
Containers created: 2
==================================================

==================================================
CONVERSATION API:

Total number of containers before first request: 29
Total number of containers before second request: 30
Total number of containers after second request: 31
Containers created: 2
==================================================
3 Likes