400GB to Vector Store(s) + Assistant

ShadSuper · October 20, 2025, 5:27pm

We have 400GB of unique, already pared down and filtered data (large genomic). Trying to get this into a Vector Store attached to an Assistant. My understanding is 10K file limit and a 5MB file upload (attach) limit for Vector Stores. This is different than the 512MB limit for “Files”. Is my understanding correct? Is my need to break 400GB into 5MB files will require several Vector Stores to be created and attached to my Assistant due to the 10K file limit? Uploading via the API.

aprendendo.next · October 20, 2025, 5:37pm

Hi.
From what I could find, the limit seems to be 5 million tokens per file.

Limits

The maximum file size is 512 MB. Each file should contain no more than 5,000,000 tokens per file (computed automatically when you attach a file).

ShadSuper · October 20, 2025, 5:52pm

That is the FILES limit which the API will gladly accept a 500MB file. The issue is attaching to the vector store and it needing to be under 5BM based on our tests.

vb · October 20, 2025, 6:06pm

I also checked the documentation for the Assistants API and found this quote in the deep dive section

You can attach a maximum of […] 10,000 files to file_search (using vector_store objects).

Each file can be at most 512 MB in size and have a maximum of 5,000,000 tokens. By default, the size of all the files uploaded in your project cannot exceed 100 GB, but you can reach out to our support team to increase this limit.

It appears the bottleneck is actually the 100 GB limitation.

Also, the Assistants API is already deprecated and will be sunset next year. But maybe I misread this part.

Ps. Taking a look at the responses API documentation I think @aprendendo.next did quote the file size limitations for vector stores correct.

aprendendo.next · October 20, 2025, 6:18pm

If you look a little above in that page, the reference belongs to a section about vector stores (which does use files API to feed it):

In the url you can see it more clearly: https://platform.openai.com/docs/guides/retrieval?vector-store-batch-operations=upload#limits

The catch is, it is either 512MB or 5 million tokens, whichever comes first (which is usually tokens in this case). Notice that tokens are a bit different than MB.

Did you run into a particular error in the API that led you to believe there is a hard limit of 5 MB?

Also just curious, it seems that you are intending to look for patterns in genetic data, is that correct? I recommend making a small test with a reduced dataset to make sure textual semantic search from a vector store is what you are looking for.

ShadSuper · October 20, 2025, 6:22pm

>>Did you run into a particular error in the API that led you to believe there is a hard limit of 5 MB?

”File too large” in the Vector Store API until we got it below 5MB.

it seems that you are intending to look for patterns in genetic data

This is a retrieval tool only the output files contain all the data we need and only the data we need. Let’s try to solve the Vector Store size issue and not worry about app or data logic please.

We can upload the 100MB files no problem, we just can’t attach them. We get a similar error when uploading via the Dashboard, but it just shows Failed with no other info.

ShadSuper · October 20, 2025, 6:27pm

Can you let me know where I can read about this? Note that this is the Assistants tool found in the OAI Dashboard, used to build Vector Based Assistants to be queried via the API.

aprendendo.next · October 20, 2025, 6:35pm

I asked about what data, because tokens goes well with words, but with genetic raw data, it could led you to a much lower ratio between characters/token. But alright, let’s move on.

https://platform.openai.com/docs/assistants/migration

After achieving feature parity in the Responses API, we’ve deprecated the Assistants API. It will shut down on August 26, 2026. Follow the guidance below to update your integration. Learn more.

Do notice though, that vector stores are not going away. You can still use them on responses api as a tool, which is now called file search, in which you can attach one or more vector stores.

_j · October 20, 2025, 6:50pm

Developers have uploaded more than 10000 files before. The problem is it becomes very unmanageable:

the list endpoint has a maximum of 10000 returns, with no pagination method
the uploading by batch can fail with a report of many failed files only by count, leaving a very poor forensic report of what needs to be retried, and creating duplicates for retries of successes.
platform lock, days-long outages that have repeated occurrences, while being still billed for storage.

400 GB is going to mean 100GB or so of just vectors to compare against the query by dot product.

Then, lets talk about that storage cost.

From your figure of 400GB, where ‘pared down and filtered’ might mean that you have plain text without document overhead that would be discarded, you might conclude you’d spend $40 per day for storage (at $0.10 per GB per day). However, the amount of data usage billings will be higher than that. Consider a “chunk”:

800 tokens: 3.2kB of English language string typical
(400 tokens: repeated from other chunks in overlap)
1 embeddings: 256 dimensions of float32: 1kB
(additional metadata?)

So one might assume that 10GB of files would cost you $1 daily. However, it is going to be significantly more.

How much is the actual billings for total vector storage data going to be, even if we disregard the partial chunks with full length embeddings that would come at the end of a file?

The most likely simple context-unaware method that they use to provide ‘overlap’ at defaults is just strides through the text encoded to token numbers as an array.

Step 1: start 0, 800 tokens, 0-799
Step 2: start 400, 800 tokens, 400-1199
Step 3: start 800, 800 tokens, 800-1599…
Step N: remaining tokens

Then stored back as strings. Each resulting in an accompanying 1kB truncated vector as storage also.

You can see that this nearly doubles the storage consumption and vector count vs without overlap.

I set “Pro with code interpreter” loose to calculate your amplification of costs, using 1MB of extracted text vs your comparable 5MB:

With default overlap (stride = 400):

Chunks ≈ 1 + ceil((250,000 − 800)/400) = 624

Stored text tokens ≈ 499,200 (due to boundary alignment in this example)

Stored text bytes = 499,200 × 4 ≈ 1,996,800 bytes ≈ 2.00 MB

Amplification (strings only)
~2.00 MB / 1.00 MB ≈ 2.0×

Embeddings vector count increase

Vectors without overlap: 313

Vectors with 50% overlap: 624 (≈ 2×)

If each embedding is 256-d float32 ≈ 1 KB, embedding bytes go from ~313 KB → ~624 KB (≈ 2×).

Mesg storage for strings is approximately 1.998 MB, while embeddings are around 0.640 MB. The combined storage stands at roughly 2.638 MB for 1MB text file.

ingesting 400 GB of text yields ~250 M vectors, whose embeddings occupy ~256 GB. If it was accepted and could run, that’s far more than GPU-accelerated dot-product can do for you and more than any platform is likely to allocate. You are going to be the first developer to fail on the platform. This is a large-scale computation cluster task, not a “chat with my PDFs”.

So extrapolate 2.6x to 400GB: 1040GB - over $100 a day recurring.

You would do far better to run your own distributed vector database and do a one-time embeddings run. Then you even have “ownership” of the full-length embeddings, and can also use the full 1536 or 3072 dimensions to collect and re-rank preliminary results. Or contract this with a vector store solutions provider, that could come in at under $30k annually.

ShadSuper · October 20, 2025, 6:50pm

We use the Responses API, not the Assistants API. The Assistant tool I am referring to is the pre-built assistant in the Dashboard which you can attach to the Response API calls.

aprendendo.next · October 20, 2025, 7:34pm

For the record, it seems that it is possible to upload larger files.

The total storage though, will be larger as expected, due to the chunk overlaps:

ShadSuper · October 20, 2025, 8:00pm

Thank you for the screenshot. You uploaded and attached via the API?

aprendendo.next · October 20, 2025, 9:12pm

Yes. Here is another one.

Notice that although it has 19MB, it has 4,8 million tokens (thus, it is within the 5M tokens limit).

You can count tokens using tiktoken.

Topic		Replies	Views
Trouble with vector store for assistant, file is too large? It's under 100MB though API assistants	2	1650	July 5, 2024
Help with Vector_storage Attach in THREADS: Separating Data Effectively API threads , assistants-api , assistants-files , vector-store	4	255	May 24, 2025
Failed to index file File contains too may tokens. Max allowed tokens per file is 2000000 API api , assistants	23	4789	March 2, 2024
Assistant V2 files. Size limit API	4	6046	July 13, 2024
File retrieval in assistant uses huge amount of input tokens API assistants-api	11	3375	June 12, 2024

400GB to Vector Store(s) + Assistant

Limits

Embeddings vector count increase

Related topics