What should the size of string be for which to fetch vector embeddings from openai api

I have several documents that I need to fetch vector embeddings for using the OpenAI API, I’m using the large model which has a vector length of 3072, for this purpose I’m dividing my documents into segments. What should the length of segment be for which to fetch the vector embedding. I’m asking because I don’t want to segment into too small substrings and end up using a lot of extra memory to store the embedding vectors

Great question! I really like the idea of rolling your own document retrieval so you can tune it exactly the way you want, not someone’s idea of a pay-per-use service.

You’re already thinking in the right direction about the trade-off between vector size and the text it represents. Here’s how it breaks down:

  • At the large end, you’ve got 3072 dimensions, which works out to about 12kB per embedding if you’re using full dimensionality at 32 bits.
  • When you look at that, it makes sense to wonder: shouldn’t I be pairing that with roughly 12kB of text, otherwise I’m actually storing more data in the vectors than in the original strings? And honestly, that’s a fair way to think about it.

The nice part is the text-embeddings-3-[large|small] models are flexible. You can use the API’s dimensions parameter to truncate them, or you can do your own reduction (like taking the first portion and re-normalizing). With that, the large model can still perform really well—even down at 256 dimensions or fewer.

To put numbers on it:

  • 256 dimensions × 32 bits ≈ 1kB per embedding, which is a lot lighter.
  • You can also store them in float16 format to cut the size further, with only a tiny quality drop.

From there, it really comes down to what you’re trying to do. Do you need whole passages that make sense to both an AI and a human? Or are you more focused on retrieving many smaller chunks and ranking/sorting them back into document order? The way you define your “document” and its semantic units is what will guide the best setup.

Then: the queries that are written, and how well they match large vs. small segments of text, are important also. Embeddings themselves are an AI that must understand what a document section and a query are talking about, hopefully to find something that distinguishes each.

Hope that gives you a solid starting point to balance size and performance in the way that feels right for your application!