I think the supported file extensions are case sensitive. Might that explain your situation? It did for me.
When using the Python SDK to upload a file with extension PDF to a vector store, it failed with this error:
Exception has occurred: BadRequestError
Error code: 400 - {‘error’: {‘message’: ‘Files with extensions [.PDF] are not supported for retrieval. See https://platform.openai.com/docs/assistants/tools/file-search/supported-files’, ‘type’: ‘invalid_request_error’, ‘param’: ‘file_ids’, ‘code’: ‘unsupported_file’}}
However, when I changed the file extension to pdf, it worked without any errors.
I wonder if OpenAI intended this behavior or if it’s a bug.
Thanks. Via the support chat I got a response that this was something their engineering team was looking into (but i have a feeling this was auto-generated…)
I think it’s just a simple hard coded list of extension types they accept and indeed it may even be case sensitive.
It’s unfortunate that the code used is not able to properly account for additional parameters in the file url… with the filetype not being at the end.
Hope they will create some additional efficient code that analyzes the entire path to find the filename/type…
I was encountering this issue on my local dev, using Koa and the node SDK. The Koa-Body defaults for location and filenames need to be adjusted in order to avoid bad naming default issues.
I’m doing a bit more than perhaps required just to allow myself a few points of intercession where I can run checks… I’m referencing here a number of internal variables so this code won’t run out of the box.
Anyhow the important part is to swap out the path and newFilename for whatever extension-filled lifestyle you choose to lead.
The following code works; if you handle file streams this way, there won’t be any issues.
for key in file_keys:
obj = s3_client.get_object(Bucket=AWS_STORAGE_BUCKET_NAME, Key=key
file_stream = BytesIO(obj[“Body”].read())
file_stream.name = key.split(“/”)[-1]
file_streams.append(file_stream)