I have found and tried to diagnose an error which prevented around 50% of my documents to be added to a vector store. Specifically, certain plaintext and markdown files encounter the following problem when being added to a vector store: “The file type is not supported”.
However, simple (often one-character) changes can often circumvent this error. Below is a working example for reproducing the error, followed by an example that differs by just one character where the error does not occur.
MWE1 (produces error)
from openai import OpenAI
client = OpenAI(...)
content = """Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin ac porttitor eros. Etiam quis neque nisi. Proin in turpis augue. Vivamus ullamcorper lobortis enim, a bibendum urna mollis feugiat. Phasellus tortor justo, laoreet non elementum et, blandit faucibus lacus. Etiam luctus convallis massa vel facilisis. Sed congue in nibh bibendum sodales. Cras in diam vel ligula molestie imperdiet gravida eleifend urna. Proin eu lectus et erat lacinia mollis in at turpis. Nam sodales orci neque, vitae feugiat risus rutrum quis. Donec porttitor egestas eros, sed pulvinar eros. Aenean lacinia orci lorem. Curabitur sem augue, interdum a cursus ut, blandit nec felis. Ut bibendum eros tempus tellus imperdiet interdum. Nunc commodo sodales mattis.
Here is some data:
17
0.32
0.26"""
# Upload file
file_id = client.files.create(
file=("mymarkdown.md", content.encode("utf-8"), "text/markdown"),
purpose="assistants", # purpose="user_data"
).id
# Create vector store
vector_store = client.vector_stores.create(file_ids=[file_id])
# Wait for file to be processed
sleep(1)
# List files in vector store
files = client.vector_stores.files.list(vector_store_id=vector_store.id)
if files.data[0].last_error:
print("Error:", files.data[0].last_error)
else:
print("Success")
Output:
Error: LastError(code='unsupported_file', message='The file type is not supported.')
MWE2 (does not produce error)
content = content.replace("0.32", "0.3")
... # remaining code same as above
Output:
Success
Simply removing this one character (the 2
in 0.32
) circumvents the error.
Thanks in advance for looking into this issue.