Responses API: Excessive charges due to repetitive container creation when using Code Interpreter

Update (Nov 17, 2025): OpenAI Support escalated the bug to the API engineering team a couple weeks ago and we’re waiting to hear back. For one specific project, 55% of our total usage cost is spent on code interpreter sessions charges.

We’re expecting that our Code Interpreter charges will be refunded once OpenAI confirms that this behavior is a bug in how containers are created, so we have not implemented a fix. In any case, here’s an overview of how we would patch the bug until it is resolved, adding a function call step that’s needed before the model has the python tool available:


We’re seeing a change in behavior in how the Responses API handles Code Interpreter tool calls compared to the Assistants API, which can result in excessive charges for Code Interpreter sessions. We’ve reported the issue to OpenAI, but wanted to pass along our findings in case others are seeing similar usage charges.


When using "container": {"type": "auto"}, the API will create a new container before the model makes a CI tool call. If the model doesn’t make a CI tool call while generating its response, the API provides no information about the newly created container. Because there’s no information on the container created during the first response, the API will create a second container while generating a follow-up response, no matter what context management solution is used (manual context management, previous_response_id, or the Conversations API).

In short, as long as messages in a conversation do not trigger a Code Interpreter tool call from the model, we are charged an extra $0.03 per response for the container the API is creating without providing any information in the Response object. When the model finally makes a CI tool call, the code_interpreter_call item includes the container’s ID and can be reused across responses.

This behavior is a departure from the Assistants API, where Code Interpreter sessions were only created and charged if the assistant made a CI tool call. The CI tool behavior does not mirror other built-in tools, such as File Search, which are charged when the model makes a tool call (e.g. $2.50 / 1k File Search tool calls).

The relevant Docs section, included below, isn’t super clear as to whether we should expect a container to be created when the model makes the Code Interpreter tool call, or whenever we submit a Create Response request. We would prefer that the behavior be the former, which would mirror the Assistants API. At minimum, there should be a way to grab the container ID that the API created when generating a response regardless of whether the model makes a CI call.

The Code Interpreter tool requires a container object. A container is a fully sandboxed virtual machine that the model can run Python code in. This container can contain files that you upload, or that it generates.

There are two ways to create containers:

  1. Auto mode: as seen in the example above, you can do this by passing the "container": { "type": "auto", "file_ids": ["file-1", "file-2"] } property in the tool configuration while creating a new Response object. This automatically creates a new container, or reuses an active container that was used by a previous code_interpreter_call item in the model’s context. Look for the code_interpreter_call item in the output of this API request to find the container_idthat was generated or used.

How to reproduce

from pprint import pprint
from openai import OpenAI
import time

client = OpenAI()

def count_all_containers(client):
    total_count = 0
    after = None

    while True:
        page = client.containers.list(after=after, limit=100, order="asc")
        containers = page.data
        total_count += len(containers)

        if not getattr(page, "has_more", False):
            break

        after = containers[-1].id

    return total_count

def print_response_details(resp, label: str):
    print(f"\n=== {label} ===")
    print("Response ID:", resp.id)

    # Show raw item types for debugging
    print("--- Raw output types ---")
    for i, item in enumerate(resp.output):
        print(f"  [{i}] type={getattr(item, 'type', None)} class={type(item).__name__}")

    # 1) Text content
    print("\n--- Text content ---")
    for item in resp.output:
        if hasattr(item, "content"):
            for c in item.content:
                if getattr(c, "type", None) == "output_text":
                    print(c.text)

    # 2) Tool calls (Code Interpreter)
    print("\n--- Tool calls (if any) ---")
    used_any_tools = False
    for item in resp.output:
        item_type = getattr(item, "type", None)

        # In your output, this is literally "code_interpreter_call"
        if item_type == "code_interpreter_call":
            used_any_tools = True
            print("Code Interpreter tool call:")
            try:
                pprint(item.model_dump())   # pydantic model -> dict
            except AttributeError:
                pprint(item.__dict__)

    if not used_any_tools:
        print("No tools were invoked in this response.")
    
def manual_context(client):
  intial_containers = count_all_containers(client)
  print(f"\n=== Container Count ===\nTotal number of containers before first request: {intial_containers}")

  conversation_history = [
      {"role": "user", "content": "Write a one-sentence bedtime story about a unicorn."}
  ]

  first_response = client.responses.create(
      model="gpt-4.1",
      input=conversation_history,
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      store=True
  )
  print_response_details(first_response, "First Response, Manual Context")

  conversation_history.extend(first_response.output)

  conversation_history.append(
      {"role": "user", "content": "Write me another one."}
  )

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_first_resp = count_all_containers(client)
  print(f"\n=== Container Count ===\nTotal number of containers before second request: {containers_after_first_resp}")

  second_response = client.responses.create(
      model="gpt-4.1",
      input=conversation_history,
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      store=True
  )
  print_response_details(second_response, "Second Response, Manual Context")

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_second_resp = count_all_containers(client)
  print(f"\n=== Container Count ===\nTotal number of containers after second request: {containers_after_second_resp}")
  print(f"Containers created: {containers_after_second_resp - intial_containers}")

def previous_response_id(client):
  intial_containers = count_all_containers(client)
  print(f"\n=== Container Count ===\nTotal number of containers before first request: {intial_containers}")

  first_response = client.responses.create(
      model="gpt-4.1",
      input="Write a one-sentence bedtime story about a unicorn.",
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      store=True
  )
  print_response_details(first_response, "First Response, Previous Response ID")

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_first_resp = count_all_containers(client)
  print(f"\n=== Container Count ===\nTotal number of containers before second request: {containers_after_first_resp}")

  second_response = client.responses.create(
      model="gpt-4.1",
      input="Write me another one.",
      previous_response_id=first_response.id,
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      store=True
  )
  print_response_details(second_response, "Second Response, Previous Response ID")

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_second_resp = count_all_containers(client)
  print(f"\n=== Container Count ===\nTotal number of containers after second request: {containers_after_second_resp}")
  print(f"Containers created: {containers_after_second_resp - intial_containers}")

def conversation_api(client):
  intial_containers = count_all_containers(client)
  print(f"\n=== Container Count ===\nTotal number of containers before first request: {intial_containers}")

  # Create a new conversation to store the responses
  conversation = client.conversations.create()
  
  first_response = client.responses.create(
      model="gpt-4.1",
      input="Write a one-sentence bedtime story about a unicorn.",
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      conversation=conversation.id
  )
  print_response_details(first_response, "First Response, Conversation API")

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_first_resp = count_all_containers(client)
  print(f"\n=== Container Count ===\nTotal number of containers before second request: {containers_after_first_resp}")

  second_response = client.responses.create(
      model="gpt-4.1",
      input="Write me another one.",
      tools=[
          {
              "type": "code_interpreter",
              "container": {"type": "auto"}
          }
      ],
      conversation=conversation.id
  )
  print_response_details(second_response, "Second Response, Conversation API")

  # Allow enough time for Containers API to reflect new containers
  time.sleep(30)
  containers_after_second_resp = count_all_containers(client)
  print(f"\n=== Container Count ===\nTotal number of containers after second request: {containers_after_second_resp}")
  print(f"Containers created: {containers_after_second_resp - intial_containers}")

if __name__ == "__main__":
  print("=== Manual Context Management ===")
  manual_context(client)

  print("\n\n=== Using Previous Response ID ===")
  previous_response_id(client)

  print("\n\n=== Using Conversation API ===")
  conversation_api(client)

Which prints out:

=== Manual Context Management ===

=== Container Count ===
Total number of containers before first request: 0

=== First Response, Manual Context ===
Response ID: resp_062c9abf58c3e2fb00690909a6ee3c81959c18e4f8c3e22008
--- Raw output types ---
  [0] type=message class=ResponseOutputMessage

--- Text content ---
As the silver moonlight danced across the fields, the gentle unicorn tiptoed among the dreaming daisies, spreading sweet dreams with every twinkle of her magical horn.

--- Tool calls (if any) ---
No tools were invoked in this response.

=== Container Count ===
Total number of containers before second request: 1

=== Second Response, Manual Context ===
Response ID: resp_062c9abf58c3e2fb00690909ccfdd8819593f0397865b2703d
--- Raw output types ---
  [0] type=message class=ResponseOutputMessage

--- Text content ---
Bathed in starlight, the sleepy unicorn curled up beneath a willow tree, whispering wishes to the night sky before drifting into a world of rainbow dreams.

--- Tool calls (if any) ---
No tools were invoked in this response.

=== Container Count ===
Total number of containers after second request: 2
Containers created: 2


=== Using Previous Response ID ===

=== Container Count ===
Total number of containers before first request: 2

=== First Response, Previous Response ID ===
Response ID: resp_0ddcb712ffe454df00690909f46af0819ea851004c04793d42
--- Raw output types ---
  [0] type=message class=ResponseOutputMessage

--- Text content ---
As the moonlight sparkled on the meadow, a gentle unicorn named Luna closed her eyes and drifted into dreams where she soared above the clouds on a shimmering rainbow.

--- Tool calls (if any) ---
No tools were invoked in this response.

=== Container Count ===
Total number of containers before second request: 3

=== Second Response, Previous Response ID ===
Response ID: resp_0ddcb712ffe454df0069090a18f01c819e938eae2396e48d64
--- Raw output types ---
  [0] type=message class=ResponseOutputMessage

--- Text content ---
In a quiet, starry forest, a tiny unicorn tiptoed through silver mist, whispering sweet dreams to every sleepy creature she met along the way.

--- Tool calls (if any) ---
No tools were invoked in this response.

=== Container Count ===
Total number of containers after second request: 4
Containers created: 2


=== Using Conversation API ===

=== Container Count ===
Total number of containers before first request: 4

=== First Response, Conversation API ===
Response ID: resp_04d0ec113c5f501e0069090a3daa0081909736e69d8415ae66
--- Raw output types ---
  [0] type=message class=ResponseOutputMessage

--- Text content ---
As the moonlight sparkled on the dewy grass, a gentle unicorn tiptoed through the enchanted forest, leaving a trail of twinkling stars behind for all the dreaming animals to follow.

--- Tool calls (if any) ---
No tools were invoked in this response.

=== Container Count ===
Total number of containers before second request: 5

=== Second Response, Conversation API ===
Response ID: resp_04d0ec113c5f501e0069090a61e98c81909d01d1b3367baa5b
--- Raw output types ---
  [0] type=message class=ResponseOutputMessage

--- Text content ---
Beneath a sky full of sleepy clouds, a unicorn with a shimmering silver mane sang lullabies to the fireflies, who danced around her horn until everyone drifted into dreamland.

--- Tool calls (if any) ---
No tools were invoked in this response.

=== Container Count ===
Total number of containers after second request: 6
Containers created: 2
5 Likes

We see the same behavior using Azure OpenAI Services. This basically renders the Code Interpreter useless for our case.

1 Like

We just pushed a fix! We no longer charge for unused containers - thanks for the reports. Let us know how everything looks on your end.

2 Likes