Files API — what formats are supported?

No clear list in the documentation — https://platform.openai.com/docs/api-reference/files/create

.pdf works. But .csv results in error — “Expected file type to be a supported format: .pdf but got .csv.”

Can I only upload the .pdf file extension?

1 Like

What purpose did you specify?

from openai import OpenAI
client = OpenAI()

client.files.create(
  file=open("mydata.csv", "rb"),
  purpose="user_data"
)

“user_data”, too

so we can upload any file, but only reference .pdf in responses API?

https://platform.openai.com/docs/guides/pdf-files?api-mode=responses

Yes, file is just the data. On input type you specify what it is.

https://platform.openai.com/docs/guides/pdf-files?api-mode=responses

To clarify:

There are several destinations for files on the files endpoint, with their own “purpose”.

[1] assistants - Docs for search or assistants’ code interpreter (upload)
[2] assistants_output - Files produced by assistant or code (download)
[3] user_data - User-provided data, general (upload)
[4] fine-tune - JSONL training file (upload)
[5] fine-tune-results - Learning metrics report (download)
[6] batch - JSONL of API calls to batch (download, upload)
[7] batch_output - Fulfilled batch API calls (download)
[8] vision - Images for Assistants message attachment (upload)
[9] evals - batch of model tests (upload/download?)

What will be accepted as file format (and the level of inspection done on it) depends on the purpose.

If the destination is a vector store, as either “assistants” or “user_data”, the documentation has a list of accepted file types, notably rejecting formats like CSV or JSON that are not good for knowledge retrieval. Then further validation upon connecting to a vector store.

Other file ID use can be including a file into the code interpreter mount point container, also with its own rejections.

What is unique, though, is attaching a file as a content part of a user message. This only accepts PDF, and uses different (potentially unreliable) technique to extract the full text and also rendered page images, without any searching.