Did we lose summarization capabilities in the migration from Retrieval to file_search?

lyon.lay · June 6, 2024, 3:28am

If this is a duplicate please redirect me. I have read a number of posts but did not find answers.

I would like to do both semantic search, and summarization of files. But am concerned this is not possible

Here’s my situation:

The beta v1 assistants Retrieval tool was quite successfully able to produce document summaries for me.
Now I want to add semantic search using file_search in beta v2
I think that I want to keep using Retrieval for creating summaries because, to the best of my understanding, file_search is not good at summarization. From the docs: “Known limitations […] Better support for summarization — the tool today is optimized for search queries.”
However because file_search replaces Retrieval completely, there’s no version of the client/sdk that has both of these tools

So:

What should I do?
Is retrieval actually just vector stores under the hook and is equally bad(/good) at summarization and I should just migrate fully to file_search?

_j · June 6, 2024, 4:08am

Retrieval had a file reader, where the AI could “scroll” and load pages of documents into a growing context through multiple model calls.

File search has cut the internal myfiles_browser search down as the only method, powered now by OpenAI embeddings, without the ability for further exploration or file reading.

So a search for “how to fix pinball machines” will not then allow the AI to summarize a singular document from the returned chunks except by luck or happenstance.

A semantic search that was also powered by a document reconstructor could allow more straightforward view of the relevance within a document.

However to truly summarize, you must have a view of the whole document, or the whole document in AI-powered chunk summaries, that can fit in context.

lyon.lay · June 6, 2024, 4:26pm

Thanks for the prompt response @_j

So what I’m hearing is that yes, file_search lacks the summarization/file reading capabilities Retrieval had (via scrolling). Correct?

Do you know if/when support for this type of use case (summarization) will return to the API?

Thank you for the suggestion of creating a document reconstructor. That’s probably too involved for my purposes, and I’m considering forking v1.20.0 so that I can have both retrieval and file_search. Any reason I shouldn’t do that?

_j · June 6, 2024, 4:43pm

I cannot predict the future. I only know that the past was an expensive waste that did not specialize in anything, while the current is a waste that returns irrelevant results that still costs you double when an AI tool call is the output instead of a response to the user - instead of being automatically injected. Image recognition is not attempted.

The best implementation would be your own document extractor, where then the user can see if the extraction was done well to produce plain text and even functions, to then place the user-provided document into AI context in full for obtaining a summary, or by employing a cheaper AI that can turn 2000 token chunks into 500 token summaries that ultimately can be passed.

That can be decidedly chosen for summary tasks instead of automatically increasing the AI’s knowledge by automatic search. The best instruction is from a user, “summarize this document: {all the text}”.

lyon.lay · June 6, 2024, 8:24pm

I only know that the past was an expensive waste that did not specialize in anything, while the current is a waste that returns irrelevant results

Hahaha noted. Thanks for your help Jay, appreciate the insight on the extractor

If there’s anyone from OpenAI that can speak to your roadmap and when we might see summarization capabilities again, please comment.

Topic		Replies	Views
Retrieval tool removed from OpenAI Assistants V2 API api , assistants-api	0	383	April 29, 2024
Overcoming many small files using Assistants Retrieval API assistants	2	1495	November 26, 2023
Can I use Retrieval with Completion API (not Assistant)? API api	6	651	March 10, 2024
Assistants API File Search and Vector Stores API api , vector-db , semantic-search , assistants-files , vector-store	10	1050	September 14, 2024
Looking for clarification on knowledge retrieval and using OpenAI's vector database API assistants , assistants-api	9	4303	December 14, 2023

Did we lose summarization capabilities in the migration from Retrieval to file_search?

Related Topics