Reliably retrieving code interpreter files from the container?

Is there any way to directly access the files created at /mnt/data in the code interpreter sandbox or otherwise reliably retrieve files created by the code interpreter?

For example, it would be great to create a plot, direct the llm to save a csv used to create the plot, and be able to retrieve both.

The docs claim that:

When running Code Interpreter, the model can create its own files. For example, if you ask it to construct a plot, or create a CSV, it creates these images directly on your container. When it does so, it cites these files in the annotations of its next message. Here’s an example:

{ “id”: “msg_682d514e268c8191a89c38ea318446200f2610a7ec781a4f”, “content”: [ { “annotations”: [ { “file_id”: “cfile_682d514b2e00819184b9b07e13557f82”, “index”: null, “type”: “container_file_citation”, “container_id”: “cntr_682d513bb0c48191b10bd4f8b0b3312200e64562acc2e0af”, “end_index”: 0, “filename”: “cfile_682d514b2e00819184b9b07e13557f82.png”, “start_index”: 0 } ], “text”: “Here is the histogram of the RGB channels for the uploaded image. Each curve represents the distribution of pixel intensities for the red, green, and blue channels. Peaks toward the high end of the intensity scale (right-hand side) suggest a lot of brightness and strong warm tones, matching the orange and light background in the image. If you want a different style of histogram (e.g., overall intensity, or quantized color groups), let me know!”, “type”: “output_text”, “logprobs”: } ], “role”: “assistant”, “status”: “completed”, “type”: “message” }

You can download these constructed files by calling the get container file content method.

However I’ve found that whether or not these files are actually cited or are otherwise available after the run is a bit of a crapshoot.

If they’re never cited, they also don’t show up when I run https://api.openai.com/v1/containers/{container_id}/files to list the files on the active container.

So my confusion is in how the backend determines whether or not to “cite” the file and whether or not there is anything I can do to make this happen more reliably. Is there a special place or format I should be directing the llm to save these?

As an example:

# Save to CSV in sandbox
csv_path = '/mnt/data/tortuosity_synthetic_data.csv'
tortuosity.to_csv(csv_path, index=False)

# Plot
plt.figure(figsize=(10,4))
plt.plot(tortuosity['md'], tortuosity['tortuosity_deg_per_30m'], color='blue', linewidth=1.2)
plt.title('Tortuosity Synthetic Data')
plt.xlabel('Measured Depth (m)')
plt.ylabel('Dogleg Severity (° / 30 m)')
plt.grid(True, which='both', linestyle='--', alpha=0.4)
plt.tight_layout()
plt.show()

print(f"CSV saved to: {csv_path}")

Outputs

/home/sandbox/.local/lib/python3.11/site-packages/pandas/core/internals/blocks.py:2323: RuntimeWarning: invalid value encountered in cast
  values = values.astype(str)

[image]

CSV saved to: /mnt/data/tortuosity_synthetic_data.csv

This correctly cited the produced image, which I was able to retrieve, however the saved tortuosity_synthetic_data.csv was never cited, nor was it available through a curl https://api.openai.com/v1/containers/(randomcontainernumber)/files hit, only the image produced with plt.show() was available.

5 Likes

+1 on this, facing the same issue. To me, any generated files should be available in the container no matter if the model is annotating them in the response.

Related to this: I’d like to re-use generated files across different sessions. Adding in the container_id when making a Responses API request (reference) allows for the re-use of a previously created container. However, even IF files are present, the model is unaware of them. E.g. asking it to print out the first 5 records, it will ask to first upload a file.

This seems inconsistent, as if you include file_ids without specifically telling the model about them, it is aware of their presence and loads them in without any issue.

Considering the above, it’s unclear to me what the benefit would be of reusing a previously created container by specifying the ID (for follow-up messages on the same thread, the auto setting automatically reuses the container, so no issues for this case).

1 Like

Hi @Brett_AB , I am facing the exact same issue. You are using the agents sdk right? I find that when my agent uses the CodeInterpreterTool, it does not actually create files, and they are not shown neither in the annotations of the output or from the list container files endpoint. However, I did try directly with the responses api and it did work as intended. Its kind of frustrating that it would work for responses and not agents sdk, Id like to stay consistent in my codebase and not have to create a custom @function_tool that calls responses api.

+1 same frustration as @Brett_AB

1 Like

The issue is that:

  • The tool is implemented and described poorly
  • You are blocked from improving the internal tool language
  • Models assumed to be “trained” are not trained
  • Notebook state is self-deleting
  • You have no persistent file or image ability at all
  • A “container” is an internal convention you cannot access
  • A file listing method is incorrectly described
  • Files are locked behind only being “input” or “output”
  • Files are now further locked behind never being available unless annotated by AI
  • Files are also ephemeral, blob storage
  • The way to have the AI generate annotations is never described to you or the AI
  • The user invisibility of the code itself is not made apparent to the AI
  • Incompatible combinations of modules with methods that can never work are loaded
  • Methods of libraries for show and display() that can never display and have no presentation layer
  • The AI has no information about library modules it can employ, and must write code it will never autonomously write just to find out about them.
  • …

I can just go on, as this is barely a crib sheet for “why every feature and every internal tool stinks by design, and you should give up on Responses”

I don’t even need to remark about the stupidity of an AI that will go in loops of writing a script 2+2 because it thinks the notebook needs testing instead of its code sucking.


The solution is that you have to “system prompt” the AI that markdown web links must be created for every file created for the user, and that the URL written must be:

[file_name.txt](sandbox:/mnt/data/file_name.txt)

Thus making chat infected with undesired output at your expense, when on a non-suck code function, you could have a UI that natively shows newly-appearing files, in a file system browser even.

2 Likes