Download raw file_search embeddings?

The new File Search api is doing all the heavy lifting to parse & chuck files into text-embedding-3-large embeddings. I understand that these files/vector-stores can be referenced in future completions.

Is there any way to download the embeddings themselves or the parsed text? Our group is already parsing/chunking documents & queries the vectors with additional profile context which wouldn’t be possible in a vector-store.

Currently you can submit a file as part of a chat & ask for a markdown version, but of course that’s not chunked and is context-limited.

The vector database can only be used to enhance the knowledge of an assistant. There are no other methods besides providing the input files.

It is unlikely that OpenAI is going to be in the business of offering free embeddings for you to use with any AI from any provider (save for the actual input embedding to be matched that must be the same model).

The vector store is subsidized by the Assistants’ language model using the returned text as billable tokens.

Getting the chunked data and search output would also reveal (or not allow you to employ) the proprietary methods used, such as for input search rewriting, although they’ve been more forthcoming this round of API updates.

I would speculate that OpenAI wouldn’t directly expand their offerings to compete in the commercial vector database business vertical of its partners so overtly. (But if it can make a buck?)

You already have likely pondered more enhancements to the embeddings than the one-size-can’t-fit-all: additional metadata embedded about the source context, page numbers and table-of-contents, chunk length adaptive to your application, chunking that follows sections of documents, etc.