Input_file unreliable on inlined (data:) content (post 2026-02 update)

In February, OpenAI updated the input_file API to handle a wide range of files: File inputs | OpenAI API

This seemed like a great improvement for interactions with attachments - previously only images and PDF could be formally attached, and other types had to be converted and inserted in the user prompt with brittle tagging.

Simple test cases, however, show that the update is as brittle if not more than what we had before, for example:

BASE64_DATA=`echo "ALICE was beginning to get very tired of sitting by her sister on the bank" | base64 -w 0`

curl -X POST "https://api.openai.com/v1/responses" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "gpt-5.2",
    "input": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_file",
            "filename": "document.txt",
            "file_data": "data:text/markdown;base64,'${BASE64_DATA}'"
          },
          {
            "type": "input_text",
            "text": "Do you see document.txt? What do you see from context?"
          }
        ]
      }
    ]
  }' | jq .output[0].content[0].text

Results in:

I don’t see any document.txt attached or accessible here—only the text you pasted:

> “ALICE was beginning to get very tired of sitting by her sister on the bank”

From context, that opening line is recognizably the start of Alice’s Adventures in Wonderland (Lewis Carroll), Chapter 1 (“Down the Rabbit-Hole”). It sets the scene: Alice is bored, sitting with her sister by a riverbank, right before she notices the White Rabbit and the story begins.

If you meant a separate document.txt, please upload it or paste its contents and I can analyze it.”

This suggests that the file attachment is simply appended to the user prompt, without any guard, the filename seems completely ignored.

Adding multiple file attachments is even worse:

BASE64_DATA1=`echo "ALICE was beginning to get very tired of sitting by her sister on the bank" | base64 -w 0`
BASE64_DATA2=`echo "The house stood on a slight rise just on the edge of the village." | base64 -w 0`

curl -X POST "https://api.openai.com/v1/responses" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "gpt-5.2",
    "input": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_file",
            "filename": "alice.txt",
            "file_data": "data:text/markdown;base64,'${BASE64_DATA1}'"
          },
          {
            "type": "input_file",
            "filename": "hitchiker.txt",
            "file_data": "data:text/markdown;base64,'${BASE64_DATA2}'"
          },
          {
            "type": "input_text",
            "text": "What is the content of hitchiker.txt? What do you see from context?"
          }
        ]
      }
    ]
  }' | jq .output[0].content[0].text

Results in

I can’t actually see or read hitchiker.txt from here unless you paste its contents or upload the file.\n\nFrom the context you provided, those two lines are famous opening sentences from two different novels:

- “ALICE was beginning to get very tired of sitting by her sister on the bank …” → Alice’s Adventures in Wonderland (Lewis Carroll)

- “The house stood on a slight rise just on the edge of the village.” → The Hitchhiker’s Guide to the Galaxy (Douglas Adams)

So, from context, hitchiker.txt likely contains (at least) the opening line of The Hitchhiker’s Guide to the Galaxy, and possibly it’s a text file of notable first lines (including Alice in Wonderland), or a file intended to contain an excerpt/opening from Hitchhiker’s Guide.

If you paste the file text (or upload it), I can tell you exactly what hitchiker.txt contains.

The filename appears to be completely ignored, yet if we try to pass the input_file without it, we get:

{
  "error": {
    "message": "Missing required parameter: 'input[0].content[0]'.",
    "type": "invalid_request_error",
    "param": "input[0].content[0]",
    "code": "missing_required_parameter"
  }
}

So filename is being required… Is anyone able to confirm that this is the expected behaviour of input_file ?

1 Like

Hi @gpetit, welcome back!

Yes, that matches what I see with attached text files as well, and thank you for sharing such clear reproduction steps.

  • filename is currently required for validation.
  • However, the model does not appear to receive usable attachment semantics tied to that filename.
  • For text files, raw input_file currently behaves more like text inserted directly into context than like a real named attachment the model can reference by filename.
  • I was able to confirm this with both inline Base64 file_data and uploaded Files API file_id inputs.
  • And yes, this behavior makes multi-file cases ambiguous.

If you need reliable multi-file retrieval by document identity today, File Search is currently the more dependable option, although it is not a direct replacement for simple attachment semantics in every workflow.

Thanks for confirming. Using file_id (or file_url) is not an option in this case since we are aiming at statelessness. I tried file_url with a data url, ex. "file_data": "'"${BASE64_DATA1}"'" and unfortunately it is not supported either and results in

{
  “error”: {
   “message”: “Failed to download file from data:text/plain;base64,QUxJQ0Ugd2FzIGJlZ2lubmluZyB0byBnZXQgdmVyeSB0aXJlZCBvZiBzaXR0aW5nIGJ5IGhlciBzaXN0ZXIgb24gdGhlIGJhbms=.”,
   “type”: “invalid_request_error”,
   “param”: “url”,
   “code”: “invalid_value”
  }
}

The file attachment extraction is a surface for you to develop an app upon.

Revealing file names to the AI with additional language you do not control is absolutely not what you want or should want.

It would be a pattern that enforces a limited imagination. Whatever “filename” is here, the AI shouldn’t even perceive that a .md file is being used.

{
"model": "gpt-5.5",
"messages": [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": msgbox.get()
            },
            {
                "type": "file",
                "file": {
                    "filename": agentic-postprompt["tier3"]["path"],
                    "file_data": agentic-postprompt["tier3"]["b64_md"],
                }
            },
        ]
    }
]
}

In similar, more obvious, context, you might want to label each image with a file name, in your application interleaving text and image, or you may want to completely obfuscate any underlying filename surface or metadata being passed to AI.

Further tests reveal that pdf files do get the filename provided, but so far, I could find no other format that did (tested rtf, docx, pptx, md, cpp).

I can understand that one might want to use a .txt or .md through input_file as a prompt source replacement, but a PowerPoint presentation as a raw prompt - my imagination is probably too limited to comprehend the use case…

1 Like

you may not believe that if you put the input_text at the position 0 and append the input_files, the model can read the file.. the doc just put the wrong ordering.