Hi everyone!
I’m using the Assistants API to extract information from a few PDFs. I have the following workflow (using the python SDK):
- I make a single openai client and create a vector store
- I upload the files to a vector store once using upload_and_poll
- I then make a bunch of async calls (in parallel), each of which spawns a separate thread and run, and each of which use file_search tool with this vector store
The error I’m getting is (only occasionally, and without any specific pattern):
“Error code: 409 - {‘error’: {‘message’: ‘The vector store was updated by another process. Please reload and try again.’, ‘type’: ‘concurrent_modification’, ‘param’: None, ‘code’: None}}”
I’m not modifying the vector store myself, so I suspect something is happening when too many threads access the same vector store? Does anyone have experience dealing with this 409 error?
4 Likes
For reference – the error occurs when I try to create the thread.
1 Like
Did you solve this? I’ve got the same issue under the same circumstances. Seems like the concurrent questions to the same vector store with the same assistant are a problem?
I didn’t; I’m just retrying when it fails, that seems to work
1 Like
Can you check what version of the Python lib you have?
pip show openai
Just checked, I’m on version: 1.42.0 !
I had a similar issue/error using the Node SDK and the Uploads API.
error: {
message: 'The vector store was updated by another process. Please reload and try again.',
type: 'concurrent_modification',
param: null,
code: null
},
code: null,
param: null,
type: 'concurrent_modification'
Retrying did work, but on average it was taking me 3 attempts to get through 20 small file uploads.
My initial assumption was that I was misusing async/await
somewhere, and possibly that I need to complete all Uploads before updating the Vector Store to add the uploaded files to it.
I was using a .map()
operation on an array of files, and it was processing the files using concurrent async operations.
The solution I settled on was to remove the .map()
and replace it with:
for (file of array) {
...
upload_process_for_each(file);
}
Doing so converted all of the concurrent async operations into sequential processing operations, completely processing each uploaded file before moving onto and starting the next file’s upload process.
It takes significantly longer but it is much cleaner. I stopped receiving the error: { type: 'concurrent_modification' }
error, and all of my larger (600-1000KB) .pdf files stopped failing on upload.