Invalid input: Expected file type to be a supported format: .pdf but got .docx

When I use file search with new Response API and instead of PDF i use docx in vector store i get error “Invalid input: Expected file type to be a supported format: .pdf but got .docx.”

With PDF new response api works but not with docx

However documentation mentions vector store supports docx.

Update:
Just wanted to add here incase it hepls anyone:
Basically I was adding my tools like this:

{‘type’: ‘file_search’, ‘vector_store_ids’: [‘vs_id’]}

And also I was adding file id to my content array:
“content”: [
{
“type”: “input_file”,
“file_id”: http://file.id,
},
{
“type”: “input_text”,
“text”: “What is the file about?”,
},
]

Removing from content array worked. Otherwise it was just giving api error and no request id. I have got all working just waiting for code interpreter availability then i can release it to prod.

2 Likes

I have the same problem! I am able upload all type of files via https://api.openai.com/v1/files and it gives a file_id.

however, when I ask via the API t convert into Markdown it only works with .pdf but not for .docx and .doc, while the front-end does.

Anybody a clue?

confirmed. The playground file upload allows me to upload a text file for example and I can ask questions about it in the same query. However programatically I get this error.

input

{ 'model': 'gpt-4o-mini', 'input': [{'role': 'user', 'content': [{'type': 'input_file', 'file_id': 
'file-SYjmkPFtqptWYLy8dDLt3D'}, {'type': 'input_text', 'text': 'summarise the release process for alleycat'}]}], 'temperature': 0.7, 'instructions': 'The 
user has attached a file for you to analyze.'}

error:

{'error': {'message': 'Invalid input: Expected file type to be a supported format: .pdf but got .md.', 'type': 
'invalid_request_error', 'param': 'input', 'code': None}}

I tested the examples from openai with different PDF files.
https://platform.openai.com/docs/guides/pdf-files?api-mode=responses

Result: Some PDF’s work, others don’t

My use case: I wanted to use openai for reading my PDF and producing summaries because the VectorStore does not understand PDFs that consists only of an image, so no text-retrieval from PDF but really do OCR. And exactly the PDFs that did not work with PDF Splitters around cannot be read by the API. Additional: if I do it manually in the chatGPT UI it works as expected but not using the file-uploads from openAI’s file API.

Let me know if you find out more!
Regards,
Michael

1 Like

If I try jpg it states immidiatly
message: ‘Invalid input: Expected file type to be a supported format: .pdf but got .jpg.’,

So somehow the file upload from the API works different from the examples approach.

This API topic seems very easy to become confused in, or to think you are doing one thing and be doing the other.

There’s two possibilities.

  1. Direct file attachment of a PDF from files endpoint id to a user message
  • this uses both text extraction and vision, placing the whole file contents (and possibly exceeding context if too large)
  1. Using file_search tool, in combination with a vector store
  • where vector store file attachment is where you would encounter issues.

The symptom is “dumb file inspection” — which is also the cause.

Direct file attachment of PDF-Files is not working accurate. If I attach a PDF that has only a image in it with no text, e.g. if you printed out something with “save PDF” than it fails. That’s what I think I found out but maybe someone can test it as well? I get the error from the API that it cannot access the PDF. I cannot upload an example PDF here.

“Invalid input: Expected file type to be a supported format: .pdf but got .jsonl.”, “type”: “invalid_request_error”

I get the same issue but with jsonl files that I aim to use in the batch API. The docs say that the batch API requires jsonl.

We’re trying to use docx/doc in the responses API via a file id. I asked the OpenAI support bot and here is what it said:

Hi!

I’m an AI support agent and happy to help. Currently, OpenAI’s API only supports certain file formats for reading and extracting content.

For the responses API and tools like the Assistants API, only PDFs are accepted for document analysis or referencing. When you upload a .docx or .doc file and try to use its file ID, you’ll see an error like the one you posted, stating that only PDF is supported.

How to proceed:
- Convert your Word documents (.docx/.doc) to PDF format before uploading.
- Upload the resulting PDF using the files API and reference the new file ID in your request. If you have any additional questions about file handling or supported formats, let me know!