I think it might be better if you created the vector store, and then iterated through attaching each file to the vector store.
You can launch those in larger parallel groups of async API calls; you aren’t rate-limited but you would just keep the amount bounded concurrency in progress reasonable for the endpoint to not try to directly stimulate failures.
Then you’d have a processor that is resilient against individual failures, having a database of files that are being worked through, and then can try again a few times on one failure.
You’ll probably want to come up with a larger metadata format for processing, encompassing the file_id obtained by uploading, ensuring success there, then the metadata for files such as their chunking strategy, and the status of attachment and retries attempted, and then even bigger picture, by job, by customer. Certainly then a facility for long-term maintenance of removing and deleting files also. (not too far away then from your own vector store).
You can poll an individual file for its success status:
https://api.openai.com/v1/vector_stores/{vector_store_id}/files/{file_id}
{
"id": "file-abc123",
"object": "vector_store.file",
"created_at": 1699061776,
"usage_bytes": 1234,
"vector_store_id": "vs_abcd",
"status": "completed",
"last_error": null
}
Just don’t try adding again if one is in progress too long, as you might ultimately have duplicates. Delete the vector file id that hangs instead of arriving at “status”: “completed”.
You should be able to get this done in far less than two days.
There’s a method to create mini-batches of files. Then you’d still have the problem of a list of “Failed” potentially taking a long time. Then again, you have to individually discover failed IDs out of that and resolve, from merely counts being returned.
POST https://api.openai.com/v1/vector_stores/{vector_store_id}/file_batches