Horrible Code Interpreter Bug

The Code Interpreter has a really unacceptable bug.
When you use the "auto" method for container creation, but send chat inputs that don’t call the tool, the API keeps creating new containers even if you do not need them at all. You just get billed an extra $0.03 for like every message, even if you just ask simple questions over and over again.

We didn’t notice this at first, because we never expected such behavior. But in our multi-user chatbot application, after switching to responses API this ended up creating 100-150k containers in a short time, costing us several thousand dollars in unexpected charges.

It would be highly desirable if the Code Interpreter behaved like it does in the Assistants API, or at least provided reasonable manual options to manage container creation and reuse. Because if you try do implement some kind of reuse mechanisms several other problems appear (But thats another topic.. You will know what I mean if you already fighted with the interpreter)
I honestly can’t imagine any use case where this current behavior would ever be intentional or useful.

3 Likes

We see the exact same behavior. This should be fixed soon!

The issue is that the Responses API creates a new Code Interpreter container whenever the Code Interpreter tool is passed along as available, but doesn’t return information about that container if the model doesn’t actually call Code Interpreter and return a code_interpreter_call output item.

Example Conversation
[User] Good morning
— Container #1 created —
[Assistant] Hello

[User] What is 5 + 5?
— Container #2 created —
[Assistant] 10

[User] Create a CSV with 10 random emails
— Container #3 created —
— Code Interpreter Tool called —
[Assistant] Here’s the file:…

[User] Generate 10 more
— Container #3 reused —
— Code Interpreter Tool called —
[Assistant] Here’s the file:…

So you’re paying much more for chats where the model doesn’t actually use the container, which is ironic. Once you get the model to use CI, costs go way down as you’re re-using the some container. See this post for replication code:

It’s a hard problem to convince others is happening because the only way to verify you’re creating new containers is to use the List Containers endpoint in the Containers API. OpenAI Support escalated the bug to the API engineering team a couple weeks ago and we’re waiting to hear back.

eta/ We haven’t implemented this yet, but if I were to deal with the issue in the meantime, I’d try a function call step that’s needed before the model has the python tool available:

Neither can I. I’m sure it’s not deliberate by OpenAI to earn revenue.

I guess we will go with an own custom tool which calls our own code interpreter. Very annoying.. But thanks for you suggestion for the separate pre function call @ekassos

1 Like

The workaround is to just create a container ID and manually supply it. You get billed for the creation regardless, but you get the ID you can persist and use to avoid new creations.

Clever programming is take an uncalled container after a response is done, detach and pool it, and pass it on to the next user API call needing one, shuffling around your cost of invoking just a single one or a handful without use, and keep it alive. Only commit it to the session when called upon. Up the cleverness by passing contaner to a “reset Jupyter and delete mount point” AI call and re-pool if the user says “bye”.

Or: offer a python tool of your own, “I want to program()” that turns code interpreter on “program away pal”.

The ContainerPool is a nice idea in principle, but if you follow the approach of keeping 5–10 containers ready and calling a function before each request to determine whether a container needs to be attached to a session, this introduces significant overhead—especially considering that, in our use case, only a small percentage of requests actually require the interpreter. Another issue with this approach is that containers can expire, which disrupts conversations if a reference to an expired container is stored in history and later used as context. You would need to constantly check for this and create new containers as needed or somehow keep the container alive as long as the conversation exists (In our case around 30 days or so). Overall, all of these approaches are too fragile for our application and involve too much workaround logic, particularly since the Assistants API already works as expected right out of the box.

If you’re going the custom route with a workaround, I would strongly suggest against creating containers of your own, mirroring the Responses API current behavior. You’ll still get charged $0.03 per new conversation that may not require Code Interpreter. At that point, will possibly get charged even more if user messages are sent more than 20 minutes apart, at which point you’ll need a new, unexpired container.

Every other suggestion is reasonable. I would still recommend a just-in-time container creation approach so you won’t get hit with the latency of creating the container in your first message when you may not need it.

If you follow the first approach, you need 0 containers “ready”, and only as many speculative containers created as you have concurrent initial chats that have not called for the tool but need their own ID.

The second “turn on python” tool idea is only creating containers when demanded and will be assigned to a session/conversation - and then yes, the big fault that if you aren’t self managing (which needs you to create an ID anyway) but using a server chat state product, that will break extremely quickly, even the amount of thinking and typing someone might do before their next ongoing input.

You have identified: not a single Responses hosted tool works like it should or like you would want; they all imagine you want exactly ChatGPT (and now gpt-5.1 even has an anti-developer unstoppable “you chat” tune-up system message).

The ongoing “auto” bug here: you get charged $0.03 for every new user input! 10 “hello” messages added $0.30 and still didn’t provide you a code interpreter ID to reuse.

Thanks everyone and sorry about this. We just pushed a change to stop billing for unused containers. It should be live now, so let us know how everything looks for you.

Are there refunds for containers that were accidentally provisioned? If yes, is the refund process the same for those created via Azure OpenAI Services?

Great question. Please feel free to write into support@openai.com with the amount we overcharged and share your Case # here so I can follow up with you directly! Thank you.

1 Like