Download raw file_search embeddings?

joshwashere · April 24, 2024, 4:27pm

The new File Search api is doing all the heavy lifting to parse & chuck files into text-embedding-3-large embeddings. I understand that these files/vector-stores can be referenced in future completions.

Is there any way to download the embeddings themselves or the parsed text? Our group is already parsing/chunking documents & queries the vectors with additional profile context which wouldn’t be possible in a vector-store.

Currently you can submit a file as part of a chat & ask for a markdown version, but of course that’s not chunked and is context-limited.

_j · April 24, 2024, 8:38pm

The vector database can only be used to enhance the knowledge of an assistant. There are no other methods besides providing the input files.

It is unlikely that OpenAI is going to be in the business of offering free embeddings for you to use with any AI from any provider (save for the actual input embedding to be matched that must be the same model).

The vector store is subsidized by the Assistants’ language model using the returned text as billable tokens.

Getting the chunked data and search output would also reveal (or not allow you to employ) the proprietary methods used, such as for input search rewriting, although they’ve been more forthcoming this round of API updates.

I would speculate that OpenAI wouldn’t directly expand their offerings to compete in the commercial vector database business vertical of its partners so overtly. (But if it can make a buck?)

You already have likely pondered more enhancements to the embeddings than the one-size-can’t-fit-all: additional metadata embedded about the source context, page numbers and table-of-contents, chunk length adaptive to your application, chunking that follows sections of documents, etc.

Topic		Replies	Views
Vectors vs Embeddings - are embeddings now obsolete? API api	1	447	May 21, 2024
What model does the Vector store functionality use? API vector-store	5	437	August 7, 2024
File Search with own vector db? API	0	82	August 12, 2024
Get retrieved text chunks from file_search tool? API assistants-files	2	971	June 7, 2024
File_search with max num results API	4	72	April 30, 2025

Download raw file_search embeddings?

Related topics