Cannot upload several file types to vector store

I was initially doing this with the api, which didn’t work, so I tried it using the UI directly.

These are just normal/simple files:

bash-3.2$ cat blah.php

<?php // PHP code goes here ?>

bash-3.2$ cat blah.py
f = open(“demofile.txt”, “r”)
print(f.read())
bash-3.2$ cat blah.rb
puts ‘wut’

I just dragged and dropped these into the “Add Files” dialog and immediately they say they’re unsupported

What is extension of the file? Also what is the ‘purpose’?

Answers are in the original post.

Sorry but I don’t see what you’re passing in ‘Purpose’ parameter while uploading the file.

Maybe try the Ui yourself? There’s no selector for purpose. As stated i just dragged and dropped.

Yeah, my bad. Was able to replicate the same issue. When dragging the file to vector store, its not giving issue for json, txt or docx file but giving issue for php py and rb.

When I dragged docx file, it was stored with purpose ‘Assistant’ by default. I agree, it’s weird behavior.

Can you locally identify the encoding type of the file? In linux:

file --mime-encoding *.py

This will remove the possibility that it is not one of the accepted types, and if is then probably a bug in the UI.

$ file --mime-encoding blah.py
blah.py: us-ascii
$ file --mime-encoding blah.rb
blah.rb: us-ascii
$ file --mime-encoding blah.php
blah.php: us-ascii

It was a problem with the API directly as well

OK, for now let’s skip the mess with drag-and-drop since you are unable to use the API to do the upload (and drag-and-drop debugging may require using the browser developer tools to track Fetch/XHR).

Which endpoint or method are you using? Are you able to upload to file store first and then add the file id to a vector store? Are you able to use the file batch method? Since I and others are able to upload files to vector storage, what is different with the functions you are using?

Are you able to upload ruby/py/php files? Using the UI? Using the API? Please confirm.


I’m using the ruby-openai gem: GitHub - alexrudall/ruby-openai: OpenAI API + Ruby! 🤖❤️ NEW: Assistant Vector Stores

file = client.files.upload(
parameters: {
file: path_to_upload,
purpose: “assistants”
}
)

response = client.vector_store_files.create(
vector_store_id: vector_store[‘id’],
parameters: {
file_id: file[‘id’]
}
)

This results in this:

client.vector_store_files.list(vector_store_id: vector_store[‘id’])

{“object”=>“list”,
“data”=>
[{“id”=>“file-w5ZxKuQYi5lSsOnhKLto7tgo”,
“object”=>“vector_store.file”,
“usage_bytes”=>0,
“created_at”=>1720957235,
“vector_store_id”=>“vs_BGZTsHutIG4ncQkAjQYo7pJm”,
“status”=>“failed”,
“last_error”=>{“code”=>“unsupported_file”, “message”=>“The file type is not supported.”},
“chunking_strategy”=>{“type”=>“static”, “static”=>{“max_chunk_size_tokens”=>800, “chunk_overlap_tokens”=>400}}}],
“first_id”=>“file-w5ZxKuQYi5lSsOnhKLto7tgo”,
“last_id”=>“file-w5ZxKuQYi5lSsOnhKLto7tgo”,
“has_more”=>false}

and viewing the same thing in the UI:

Looks like the ruby gem is posting to /files to create the file

And then it’s posting to /vector_stores/:id/files to add it to the vector store:

So your dashboard (the file storage tab) should have one file with id
file-w5ZxKuQYi5lSsOnhKLto7tgo
What is the status, file size, and purpose shown for that file? Better yet, upload a screenshot. If nothing looks unusual, then try uploading the same content into a file with ‘.txt’ suffix and add to vector store, and then see if you get the same error.

seems legit:

blah.txt fails in exactly the same manner:

      file = client.files.upload(
        parameters: {
          file: path_to_upload,
          purpose: "assistants"
        }
      )

{"object"=>"file", "id"=>"file-cr2IqoYc5TKLAy9g3tXSsLn5", "purpose"=>"assistants", "filename"=>"blah.txt", "bytes"=>48, "created_at"=>1721005143, "status"=>"processed", "status_details"=>nil}
      response = client.vector_store_files.create(
        vector_store_id: vector_store['id'],
        parameters: {
          file_id: file['id']
        }
      )
{"id"=>"file-cr2IqoYc5TKLAy9g3tXSsLn5", "object"=>"vector_store.file", "usage_bytes"=>0, "created_at"=>1721005143, "vector_store_id"=>"vs_BGZTsHutIG4ncQkAjQYo7pJm", "status"=>"in_progress", "last_error"=>nil, "chunking_strategy"=>{"type"=>"static", "static"=>{"max_chunk_size_tokens"=>800, "chunk_overlap_tokens"=>400}}}
client.vector_store_files.list(vector_store_id: vector_store['id'])

{"object"=>"list",
 "data"=>
  [{"id"=>"file-cr2IqoYc5TKLAy9g3tXSsLn5",
    "object"=>"vector_store.file",
    "usage_bytes"=>0,
    "created_at"=>1721005143,
    "vector_store_id"=>"vs_BGZTsHutIG4ncQkAjQYo7pJm",
    "status"=>"failed",
    "last_error"=>{"code"=>"unsupported_file", "message"=>"The file type is not supported."},
    "chunking_strategy"=>{"type"=>"static", "static"=>{"max_chunk_size_tokens"=>800, "chunk_overlap_tokens"=>400}}},
   {"id"=>"file-w5ZxKuQYi5lSsOnhKLto7tgo",
    "object"=>"vector_store.file",
    "usage_bytes"=>0,
    "created_at"=>1720957235,
    "vector_store_id"=>"vs_BGZTsHutIG4ncQkAjQYo7pJm",
    "status"=>"failed",
    "last_error"=>{"code"=>"unsupported_file", "message"=>"The file type is not supported."},
    "chunking_strategy"=>{"type"=>"static", "static"=>{"max_chunk_size_tokens"=>800, "chunk_overlap_tokens"=>400}}}],
 "first_id"=>"file-cr2IqoYc5TKLAy9g3tXSsLn5",
 "last_id"=>"file-w5ZxKuQYi5lSsOnhKLto7tgo",
 "has_more"=>false}

Here are the files. Not exactly anything complex.

https://drive.google.com/drive/folders/1jOlwaWBwhgv9TwK3JWGY48cKtkeVy7N-?usp=sharing

I’ve been suspecting the problem is associated with the mimetype header being problematic somewhere. To eliminate the ruby library as the cause, try using curl as in the shell script below:

#!/bin/sh
if [ "${OPENAI_API_KEY}" = "" ]; then
        echo "Missing OPENAI_API_KEY env"
        exit
fi

#CONTENT_TYPE="text/plain"
CONTENT_TYPE="application/json"
FILE_STORE_ID="file-w5ZxKuQYi5lSsOnhKLto7tgo"
VECTOR_STORE_ID="vs_BGZTsHutIG4ncQkAjQYo7pJm"

curl https://api.openai.com/v1/vector_stores/${VECTOR_STORE_ID}/files \
    -H "Authorization: Bearer ${OPENAI_API_KEY}" \
    -H "Content-Type: ${CONTENT_TYPE}" \
    -H "OpenAI-Beta: assistants=v2" \
    -d "{
         \"file_id\": \"${FILE_STORE_ID}\"
        }"

Change the FILE_STORE_ID with either the ‘.rb’ or the ‘.txt’ file id. If this works then I have remedial options for you, but you’re probably not going to like it.

EDITED above to get the right json syntax

I get this for both of them:

CONTENT_TYPE="application/json"
FILE_STORE_ID="file-cr2IqoYc5TKLAy9g3tXSsLn5"
VECTOR_STORE_ID="vs_BGZTsHutIG4ncQkAjQYo7pJm"

curl https://api.openai.com/v1/vector_stores/${VECTOR_STORE_ID}/files \
    -H "Authorization: Bearer ${OPENAI_API_KEY}" \
    -H "Content-Type: ${CONTENT_TYPE}" \
    -H "OpenAI-Beta: assistants=v2" \
    -d "{
         \"file_id\": \"${FILE_STORE_ID}\"
        }"

{
“id”: “file-cr2IqoYc5TKLAy9g3tXSsLn5”,
“object”: “vector_store.file”,
“usage_bytes”: 0,
“created_at”: 1721005143,
“vector_store_id”: “vs_BGZTsHutIG4ncQkAjQYo7pJm”,
“status”: “in_progress”,
“last_error”: {
“code”: “unsupported_file”,
“message”: “The file type is not supported.”
},
“chunking_strategy”: {
“type”: “static”,
“static”: {
“max_chunk_size_tokens”: 800,
“chunk_overlap_tokens”: 400
}
}
}%

So, going back to this output

I see us-ascii, but the documentation says it only accepts utf-8, utf-16, or ascii. It could be that us-ascii is not equivalent to ascii. If you can construct a file with forced encoding of utf-8 , then this should work. Otherwise, it’s a bug as it does not match the document.

ding ding ding ding ding ding ding ding ding