Is it possible to upsert when adding files to a vector store (as in collection.upsert()
for Chroma)?
Can one even name the files in the vector store in order to determine whether the file already exists in the vector store and should be updated instead of a new (redundant) record created? I cannot determine whether this is possible, based on the API docs; it appears that file names (file IDs) are automatically assigned. For example, from the quick start docs:
# Create a vector store caled "Financial Statements"
vector_store = client.beta.vector_stores.create(name="Financial Statements")
# Ready the files for upload to OpenAI
file_paths = ["edgar/goog-10k.pdf", "edgar/brka-10k.txt"]
file_streams = [open(path, "rb") for path in file_paths]
# Use the upload and poll SDK helper to upload the files, add them to the vector store,
# and poll the status of the file batch for completion.
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id=vector_store.id, files=file_streams
)
…so it appears that the user cannot assign names to the files so that if those identically named files are uploaded again to the vector store, they will be updated instead of a new, redundant record is added to the vector store.