You can ignore the above “you can try” reply, which is just repeating back your bad idea.
The above method is solely for adding a single file to an existing vector store. It does not take an assistant ID as a parameter.
vector_store_file = client.beta.vector_stores.files.create(
vector_store_id="vs_abc123", # your existing vector store ID
file_id="file-abc123", # your uploaded file ID
)
More useful is the “batch” method to submit more than one file ID if needed.
file_batch = client.beta.vector_stores.file_batches.create(
vector_store_id="vs_abc123", # your existing vector store ID
file_ids=file_id_list, # python list of uploaded file IDs
chunking_strategy={
"type": "static",
"static": {
"chunk_overlap_tokens": 200,
"max_chunk_size_tokens": 600
}
}
)
Then you can use the file_batch.id to continue to poll yourself if you want blocking until the processing is complete:
batch_status = client.beta.vector_stores.file_batches.retrieve(
vector_store_id="vs_abc123",
batch_id=file_batch.id
)
>>>print(batch_status)
{
"id": "vsfb_abc123",
"object": "vector_store.file_batch",
"created_at": 1699061776,
"vector_store_id": "vs_abc123",
"status": "in_progress",
"file_counts": {
"in_progress": 1,
"completed": 1,
"failed": 0,
"cancelled": 0,
"total": 0,
}
}
Having that in place, you can send over a bunch of files, iteratively or in parallel, creating the list to batch:
# List of file paths to be uploaded
file_paths = ["file1.txt", "file2.txt"]
# List to store the file IDs of uploaded files
file_id_list = []
# Iterate over the file paths
for file_path in file_paths:
# Open each file in binary read mode
with open(file_path, "rb") as file:
# Upload the file to the API
response = client.files.create(
file=file,
purpose="assistants"
)
# Append the file ID from the response to the list
file_id_list.append(response.id)
The vector store ID needs to be connected to an assistant properly, and you’ll want to use more controls than letting file search go wild with your budget:
assistant_parameters = {
"name": "my document assistant",
"description": "assistant with vector store",
"model": model, # variables let you begin your own UI
"instructions": instructions,
"top_p": 0.5,
"temperature": 0.5,
"tools": [
{
"type": "file_search",
"file_search": {
"max_num_results": 12,
"ranking_options": {
"ranker": "default_2024_08_21"
"score_threshold": 0.4
}
}
}
],
"tool_resources": {
"file_search": {
"vector_store_ids": [vector_store.id]}
},
"metadata": {},
"response_format": "auto"
}
assistant = client.beta.assistants.create(**assistant_parameters)
print(f"Created Assistant ID: {assistant.id}")
Then you’ll wonder why you made all those API requests when you could have just blasted extracted document text under your control and inspection as part of a chat completions message.