I noticed that at least since November 2025, when creating new vector stores, almost identical uploaded files take ~ x2 less reported space and also the upload process got ~10 times slower.
For comparaison, mostly same files with the same chunking strategy, batch size, concurrency, uploaded to a vector store created on the October 20th:
* Info for vs_68f67d45371481919334e7e8878c6d1d *
Usage MB: 55.11
File Counts: {
"in_progress": 0,
"completed": 7754,
"failed": 0,
"cancelled": 0,
"total": 7754
}
However, if I retrieve all files IDs and count the sum of all files with openai.files.retrieve(fileId) I obtain:
Vector store ID 'vs_68f67d45371481919334e7e8878c6d1d' has 7754 files COUNTED.
Total size COUNTED SUM: 23.90 MB
Almost the same files, just 20% more of the same type, uploaded to the vector store created and files uploaded on the November 13th:
* Info for vs_6915f4ee529c8191b2ea3b5acba4b794 *
Usage MB: 25.53
File Counts: {
"in_progress": 127,
"completed": 8966,
"failed": 9,
"cancelled": 0,
"total": 9102
}
Vector store ID 'vs_6915f4ee529c8191b2ea3b5acba4b794' has 9102 files COUNTED.
Total size COUNTED SUM: 25.55 MB
In older vector stores usage reported through openai.vectorStores.retrieve(storeId) does not add up to the same number when summing usage for each file. On the other hand, for newer vector stores, usage reported at the vector store level and per-file sum are almost identical. Also, in general there are more failed upload in the new vector store, in_progress files are stuck too.
If I delete and re-upload files to the same older vector store vs_68f67d45371481919334e7e8878c6d1d from October, files still upload much faster (~30 minutes) and take more space, around 55 MB. On the other hand, uploading to the newly created vector store, using the same script and configs takes about ~5 hours (degradation 10 folds) and the total vector store size is about twice as small.
On disk, those files take about 45MB, so I would assume, with the embedding, the size of around 55MB in the October vector store is expected and normal.
Why is there a discrepancy in the older vector stores between reported and computed storage usage?
I cannot find any documentation or announcements about vector store been updated. The latest SDK version does not contain any modifications or documentation about those changes either.
My main questions:
Is there any way to achieve the previous upload speeds in the newly created vector stores?
What are best practices to mitigate errors of stuck failed and in_progress files?
Could somebody explain is there any recent (permanent) change in the vector store backend? What exactly causes increased latency, different reported usage bytes?
Update Nov 26th 2025: Today created a new vector store and, after uploading the same files, total displayed in the dashboard size is back to the previous pre-November values of around 50MB, which effectively doubled since the previous time 2 weeks ago.