Missing File Annotation for Subsequent Files in Code Interpreter

When using the OpenAI API with the code_interpreter tool, the first generated file is correctly annotated and downloadable via a file_id. However, any subsequent files requested in the same container are created but not annotated in the response. Without annotation, these later files cannot be retrieved programmatically through the API’s file endpoints.

Steps to Reproduce

  1. Ask for the first file:
response = client.responses.create(
    include=["code_interpreter_call.outputs"],
    model="gpt-5",
    input="Give me an excel file with the number 42 in it.",
    stream=True,
    tools=[{"type": "code_interpreter", "container": {"type": "auto"}}],
)
  • :white_check_mark: .xlsx file is created.
  • :white_check_mark: Response includes annotation with a valid file_id.
  • :white_check_mark: File is downloadable via the API.
  1. Ask for another file in the same container:
response = client.responses.create(
    include=["code_interpreter_call.outputs"],
    model="gpt-5",
    input=[
        {"role": "user", "content": "Give me an excel file with the number 42 in it."},
        {"role": "assistant", "content": "Here you go. [Download the Excel file](sandbox:/mnt/data/number_42.xlsx)"},
        {"role": "user", "content": "Now give me the same as a CSV file."}
    ],
    stream=True,
    tools=[{"type": "code_interpreter", "container": "cntr_..."}],
)
  • :warning: Second file (e.g., .csv) is generated.
  • :warning: Response only contains a sandbox-style link (sandbox:/mnt/data/number_42.csv).
  • :warning: No annotation is present (annotations: []).
  • :warning: Container listing shows the file exists.
  • :cross_mark: Attempting to fetch the file via file_id results in 404 Not Found.
1 Like

I’ve just spent hours trying to program around this and came here to complain about it and found your post. I searched on “annotations” and found numerous recent posts complaining about the same issue. Examples:

Downloading a file generated by code_interpereter tool - API - OpenAI Developer Community

Code interpreter generated files path bug - API / Bugs - OpenAI Developer Community

Reliably retrieving code interpreter files from the container? - API - OpenAI Developer Community

Missing File Annotation for Subsequent Files in Code Interpreter - API / Bugs - OpenAI Developer Community

I’ve found whether an annotation is generated depends in some (to me) random way on how the prompt is worded.

This appears to be a serious bug in the Responses/Code Interpreter system. The docs say an annotation will be returned in the text response for every file generated but …. ???

3 Likes

I agree this is known issue and the only reliable way is to actually travers the messages and pick up all possible files (interpreter and image gen as well). They show up in annotations sometimes. Sometimes with the ‘file name’ but not an actual id. So file annotations right now …. You cannot rely on them.

That said. I would add that in my experience only 4o is reliably creating charts for a report prompt I have. None of the later ones can do it reliably.

1 Like

Not sure I understand “travers the messages and pick up ….”. I’m looking at all output text message and all “code_interpreter_call” objects in the ouput list. I find files in the code object with a file_id and a path but the ‘bytes’ value is always “None” and it can’t be downloaded. Sometimes the path is exactly what I expect and sometimes is a weird concatenation of two paths (which makes no sense).

For context: I’m doing Response API calls to gpt-5-mini using the web search and code tools.

I can prompt it to successfully produce a downloadable PDF file then if I follow up with a prompt to modify that file I get the non-downloadable response I just described although the text output says it creaed the requested modified PDF and even describes features of it! Very discouraging!

I am dealing with the same problem. When I just try to generate one file it works fine (mostly) and the file is returned in the message annotation. But as soon as I try to generate more than one file at once, the annotations will be empty. Sometimes I get the annotation for the last generated file.

I just had a long chat with the support bot. I was impressed with how intelligently it crafted its responses, although needlessly verbose and somewhat obtuse on some aspects.

Anyway, the TL;DR on the chat was this quote from the bot:

No current prompt or instruction can force the API to provide structured file annotation or direct download references if the backend response format doesn’t support it. The presence of a markdown link in the generated text is not the same as a programmatically accessible file reference. This is a known behavior and limitation.

It also said:

you’re absolutely right, and this distinction is important. The official documentation for the Code Interpreter and file outputs does claim that an annotation (reference) is created for every file generated. However, in practice, this is not always the case, especially when using the API. Many developers (like yourself) have noted that file outputs may not show up as structured fields in API responses, and markdown links in the response text aren’t functionally usable download URLs for programmatic access. This leads to confusing situations where the UI may hint at a file, but the API doesn’t expose a way to download it. This is a known issue/limitation: the documentation describes the expected behavior, but the actual implementation can be inconsistent, particularly for non-primary or subsequent files, or certain file types. To summarize:

  • The docs state every file should get an annotation/reference.

and:

You are correct to call this a “bug” from a developer standpoint; OpenAI refers to it as a limitation, and it is recognized both in documentation and community reports.

I argue the docs do NOT reflect that limitation, but I suppose we’re getting into trivia here.

The bot suggested workarounds like having the first prompt request a json of the info I desire in the PDF then another turn with a request to create the PDF based on the following json (followed by cut and paste of the json text returned by the first turn). This works but it’s tedious. I sure hope OpenAI eliminates this “limitation”.

1 Like

Update: I have a workaround that has worked (at least several times) for me:

I’m calling the responses API with web search and code gen tools, using GPT-5-mini. A typical case is to prompt for a list that requires web searches and the creation of a PDF containing the list.

My code looks at the “output” items in the “response” object returned by the responses API call.

There are 3 types of output items: reasoning, code interpretation, and text output. If the type of the output content is “output_text”, I look in the “annotations” for an annotation of type “container_file_citation”. If present I get the id’s needed to download the file. (Of course the problem is the required annotation is frequently NOT present, the topic of this thread.)

I also look at the output items of type ‘code_interpreter_call’. For each item I get the container_id, then call a python function that uses the endpoint to list files in a container:

(https://api.openai.com/v1/containers/{container_id}/files)

The returned list object(s) have a content.data file list object that includes ‘byte’s’, ‘id’ and ‘path’ Frequently there are items with a correct sounding (sandbox/mnt) path to a file and an id (file_id) but the bytes entry will be ‘None’. Previously I thought that meant any attempt to download the file (using the appropriate end point) will fail. That was wrong. Apparently the download failures I had seen were actually because the ‘path’ value was ill-formed, which does happen occasionally. If you go ahead and do the file download it (so far) succeeds.

More than one code_interpreter_call file item can be returned after a multi-turn sequence, e.g., after prompting to add something to the generated PDF. In that case the desired output file is the first one returned.

Of course if the response returns an output_text annotation with file info, I download based on that and ignore the file info returned by the code_interpreter call.

I hope this will help those having this problem.

Note: cases where the code interpreter returns ill-formed ‘path’ values can occur and downloads will most likely fail then, although the parameters passed to the download-file-content end point include only the container and file id’s.

1 Like

Don’t rely on file annotations. Get the entire list of files after every request is completed using container files api. Gather all “Non-User” files (check the type or source field of the file objects) as these types of files were created by the code_interpreter. Detect if you have downloaded them previously (keep the filename the same as the path in the container), and if you haven’t downloaded them do so but keep the filename the same. Download and save using the getContents endpoint. The filename is the last portion of path string. Create an array that maps the container file path to your download location. Use this array to replace the containers file path string in the final response with the downloaded path.

Think sandbox:mt/xx/filename.pdf → pathToDownload/filename.pdf

The key is to get all the file objects every time and create the mapping for every file. Then loop through that array and replace any occurrence of the sandbox path with your path.

If you are streaming to a client you need to send a new type of server event at the response completion for the link_update with a json object of the array of file mappins and then replace the url on the client text. If you asked it to return markdown, you should change the url/path in javascript before you convert the markdown into an htlm link.

Oh and yes this is Janky from Openai but is much more robust from my experience. This allows you to ask the AI for the links again, because it WILL NOT, add an annotation for a file unless it creates it. It WILL provide links to the files in markdown if asked though, no corresponding annotation obviously.

Thanks but basically you lost me right away with this:

“Gather all “Non-User” files and detect if you have downloaded them (keep the filename the same as the path in the container), if you haven’t do so. Then create an array that maps the container path to your download location. Use this with a str_replace on the response to update link location.”

What are non-user files and how do I gather them? Is this using the end point that gets the list of files in a container? What is my link location?

I’m not using javascript. I’m calling the api from Python. I suspect we’re thinking in such different contexts that it’s almost like a language problem. All I know is the response object sometimes has code_interpreter output objects that say a file that I asked for was generated, and text_output output text will say that too —- but the file is not annotated and I can’t find it anywhere in the outputs list of the response object.

I updated my answer, let me know if i can clarify more.

Just wanted to plug in this thread here and especially the acknowledgement for OpenAI at the end.

Yes, think I get it now. Thanks.