Is there any way to directly access the files created at /mnt/data in the code interpreter sandbox or otherwise reliably retrieve files created by the code interpreter?
For example, it would be great to create a plot, direct the llm to save a csv used to create the plot, and be able to retrieve both.
The docs claim that:
When running Code Interpreter, the model can create its own files. For example, if you ask it to construct a plot, or create a CSV, it creates these images directly on your container. When it does so, it cites these files in the
annotations
of its next message. Here’s an example:{ “id”: “msg_682d514e268c8191a89c38ea318446200f2610a7ec781a4f”, “content”: [ { “annotations”: [ { “file_id”: “cfile_682d514b2e00819184b9b07e13557f82”, “index”: null, “type”: “container_file_citation”, “container_id”: “cntr_682d513bb0c48191b10bd4f8b0b3312200e64562acc2e0af”, “end_index”: 0, “filename”: “cfile_682d514b2e00819184b9b07e13557f82.png”, “start_index”: 0 } ], “text”: “Here is the histogram of the RGB channels for the uploaded image. Each curve represents the distribution of pixel intensities for the red, green, and blue channels. Peaks toward the high end of the intensity scale (right-hand side) suggest a lot of brightness and strong warm tones, matching the orange and light background in the image. If you want a different style of histogram (e.g., overall intensity, or quantized color groups), let me know!”, “type”: “output_text”, “logprobs”: } ], “role”: “assistant”, “status”: “completed”, “type”: “message” }
You can download these constructed files by calling the get container file content method.
However I’ve found that whether or not these files are actually cited or are otherwise available after the run is a bit of a crapshoot.
If they’re never cited, they also don’t show up when I run https://api.openai.com/v1/containers/{container_id}/files to list the files on the active container.
So my confusion is in how the backend determines whether or not to “cite” the file and whether or not there is anything I can do to make this happen more reliably. Is there a special place or format I should be directing the llm to save these?
As an example:
# Save to CSV in sandbox
csv_path = '/mnt/data/tortuosity_synthetic_data.csv'
tortuosity.to_csv(csv_path, index=False)
# Plot
plt.figure(figsize=(10,4))
plt.plot(tortuosity['md'], tortuosity['tortuosity_deg_per_30m'], color='blue', linewidth=1.2)
plt.title('Tortuosity Synthetic Data')
plt.xlabel('Measured Depth (m)')
plt.ylabel('Dogleg Severity (° / 30 m)')
plt.grid(True, which='both', linestyle='--', alpha=0.4)
plt.tight_layout()
plt.show()
print(f"CSV saved to: {csv_path}")
Outputs
/home/sandbox/.local/lib/python3.11/site-packages/pandas/core/internals/blocks.py:2323: RuntimeWarning: invalid value encountered in cast
values = values.astype(str)
[image]
CSV saved to: /mnt/data/tortuosity_synthetic_data.csv
This correctly cited the produced image, which I was able to retrieve, however the saved tortuosity_synthetic_data.csv was never cited, nor was it available through a curl https://api.openai.com/v1/containers/(randomcontainernumber)/files hit, only the image produced with plt.show() was available.