BUG: Responses API with structured output AND Code Interpreter does not provide annotations?

I am excited to have code interpreter in Responses, finally allowing ‘full’ migration from Assistants API. I am running into a problem that it seems that UNLESS the ouput is text there is no consistent way to retrieve the files code generator creates.

The easiest way to test/debug is this prompt “Create a chart that plots 1,2,4, 8 against equal x steps. Output an annotated message that has the the file name in it.” (Enable Code interpreter and output ‘text’) - > this works fine. You will see the file show up in as part of the code interpreter message and also in the text message that follows.

Now if you change the output to ‘json’ (And add the output directive) there is no file attached to either messages (not in the final output and not in the interpreter session message) (Prompt: Create a chart that plots 1,2,4, 8 against equal x stept. Output the file as chart.png and create a nice title in the output JSON attribute “title”)

I have retested this with several models - only 4o is able to do this correct most of the time. All other models, including 4.1 and o3 will fail. I have flagged a lot of playground outcomes as either ‘good’ or ‘bad’ based on the outcome - hope that helps. For now the solution is to simply choose text output, in which case it almost always works across models, with gpt-4o still creating the best response.

3 Likes

I am having a similar issue. I am not able to find the output of the executed code regardless of how I ask for it. This is my code:

  stream=True,
  tools=[{
      "type": "code_interpreter",
      "container": {
          "type": "auto",
          "file_ids": ["file-TXT3RH5yycr7MAX2H8kLvq"]
      }
  }],
  input=[{
      "role": "user",
      "content": [{ "type": "input_text", "text": "Run python code that shows exactly how many columns this spreadsheet has and what the name for each column is. " }]
  }],
  reasoning={"effort": "medium", "summary": "auto"},
  text={"format": {"type": "text"}},

Do you know how to access the file content and the output of the executed code? @jlvanhulst

I go through all the output messages to pick out the container responses. Then with the container_id I do list files.

    self.output_files = []
        # collect the files from the code interpreter calls and the text response object if it exists
        for output in self.raw_response.output:
            if output.type == "message":
                content = output.content
                for item in content:
                    if item.type == "output_text":
                        self.text_response_object = item
            elif output.type == "code_interpreter_call":
                files = await client.containers.files.list(container_id=output.container_id)
                # get files
                for file in files.data:
                    self.output_files.append({"file_id": file.id, "container_id": output.container_id})

(I just copy pasted, the ‘self’ is my Prompt class) and then use
await client.containers.files.retrieve() to get the files.

https://platform.openai.com/docs/api-reference/container-files/listContainerFiles
https://platform.openai.com/docs/api-reference/container-files for the details

Thank you for that.

There seems to be an easier way to do this by enabling

include=[“code_interpreter_call.outputs”],

Where would you add that?

Checkout the issue I opened.

You simply add it to your request like:

  reasoning={"effort": "medium", "summary": "auto"},
  text={"format": {"type": "text"}},
  include=["code_interpreter_call.outputs"],
  max_output_tokens=32000

Interesting - looking in the source only these would work (for Responses API)


ResponseIncludable: TypeAlias = Literal[
    "file_search_call.results",
    "message.input_image.image_url",
    "computer_call_output.output.image_url",
    "reasoning.encrypted_content",
]
1 Like

Thanks for the report! I see the problems, we’ll work on fixing the citations to be more reliable regardless of text / json modes.

1 Like

Great! Make sure to add testing for running two consecutive code interpreter sessions - 4o does best but has yet to return more than one file ever (with two expected one from each session)

1 Like