i am using this code to create a vector store, then upload files to it with specific chunking strategies. The result is that the vector store is empty, no exception thrown.
vector_store = client.beta.vector_stores.create(
name="human labeled dataset",
)
client.beta.vector_stores.files.upload_and_poll(
vector_store_id=vector_store.id,
file=open("results/results_tsm_human_labeled.json", "rb"),
poll_interval_ms=1000,
chunking_strategy={
"type": "static",
"static": {"max_chunk_size_tokens": 100, "chunk_overlap_tokens": 5},
},
)
client.beta.vector_stores.files.upload_and_poll(
vector_store_id=vector_store.id,
file=open("data/sample_tsm_new.json", "rb"),
poll_interval_ms=1000,
chunking_strategy={
"type": "static",
"static": {"max_chunk_size_tokens": 1000, "chunk_overlap_tokens": 400},
},
)
removing the chunking strategy parts has no effect.
to debug, i tried it with client.files functions, without chunking strategy:
human_dataset_result_json_file = client.files.create(
file=open("results/results_tsm_human_labeled.json", "rb"), purpose="assistants"
)
human_dataset_json_file = client.files.create(
file=open("data/sample_tsm_new.json", "rb"), purpose="assistants"
)
vector_store = client.beta.vector_stores.create(
name="human labeled dataset",
file_ids=[human_dataset_result_json_file.id, human_dataset_json_file.id],
)
this results in this vector stores file count being stuck at in_progress = 2. So the 2 files are stuck at being processed, but never finished.
Any ideas on what i am doing wrong?