Assistants API File Search and Vector Stores

Hey There, dear OpenAI Forum people and hopefully OpenAI Devs!

We have been working on a RAG assistant using the Assistants API together with File Search and Vector stores. I do have to say, I’m really impressed with how well it works generally. However, there are a few points missing for me, especially in how it is documented.

My Questions:

  1. Is there any way to see which search query is generated by the assistant for the file search?

  2. Is there any way to see what the search results are that get passed to the assistant?

  3. How does the file search work if you have attached a vector store both to the assistant and the thread? Are they weighted the same, is only one vector store searched, does one have priority, etc.?

  4. Are file names included in the file search? Would it make sense to use certain labeling or even include metadata in the file names or are they not taken into account?

All these points should be viewable somewhere, for example in the log of the assistant run steps etc. To me it seems that it is currently not possible to view these points anywhere, which would be instrumental to improve the file search accuary (through file organization and prompting, for example) and see what exactly is happening during the file search.

  1. How are things looking in terms of the new capabilities that are mentioned in the documentation? As in metadata for pre search filtering, searching across structured file types & image parsing. Also, any update on when the Quotes for the citations we’ll be added back in to V2? I know, probably hard to tell, but an update on the progress would be huge.

Cheers and Thanks!

6 Likes

I want to add my support here.

File Search and Vector Stores in the Assistants API seem powerful and like an important part of the future. We’d like to build on top of that to reduce our dependency count.

However, currently working with File Search and Vector stores is difficult because there does not seem to be a way to see:

  • The search query generated by the assistant
  • The contents retrieved from the search

This is critical information for debugging and improving relevance. Without that, as a developer, it feels like stumbling around in the dark.

Does anyone know how to inspect these things now? Are there plans to add a way to do this?

1 Like

I’m afraid that all the semantic search behind the Assistants API is a black box for all of us. They are adding some parameters for configuration, but no more than that.

1 Like

Thanks!

Yes, it seems to me that there is no way to do this. Getting the search query can sometimes be possible by prompting the model to do so. You can also get a little more insights about how file search works by getting the system prompt.

Image input capabilities: Enabled

Tools

myfiles_browser
You have the tool myfiles_browser with these functions:
msearch(queries: list[str]) Issues multiple queries to a search over the file(s) uploaded in the current conversation and displays the results.
please render in this format: 【{message idx}†{link text}】

Tool for browsing the files uploaded by the user.

Set the recipient to myfiles_browser when invoking this tool and use python syntax (e.g. msearch([‘query’])). “Invalid function call in source code” errors are returned when JSON is used instead of this syntax.

Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool when the relevant parts don’t contain the necessary information to fulfill the user’s request.

Think carefully about how the information you find relates to the user’s request. Respond as soon as you find information that clearly answers the request.

Issue multiple queries to the msearch command only when the user’s question needs to be decomposed to find different facts. In other scenarios, prefer providing a single query. Avoid single-word queries that are extremely broad and will return unrelated results.

Here are some examples of how to use the msearch command:
User: What was the GDP of France and Italy in the 1970s? => msearch([“france gdp 1970”, “italy gdp 1970”])
User: What does the report say about the GPT4 performance on MMLU? => msearch([“GPT4 MMLU performance”])
User: How can I integrate customer relationship management system with third-party email marketing tools? => msearch([“customer management system marketing integration”])
User: What are the best practices for data security and privacy for our cloud storage services? => msearch([“cloud storage security and privacy”])

Please provide citations for your answers and render them in the following format: 【{message idx}:{search idx}†{link text}】.

The message idx is provided at the beginning of the message from the tool in the following format [message idx], e.g. [3].
The search index should be extracted from the search results, e.g. # 【13†Paris†4f4915f6-2a0b-4eb5-85d1-352e00c125bb】 refers to the 13th search result, which comes from a document titled “Paris” with ID 4f4915f6-2a0b-4eb5-85d1-352e00c125bb.
For this example, a valid citation would be .

All 3 parts of the citation are REQUIRED.

However, I only managed once to get the model to print what possibly might have been the actual search results. I do hope they will address this in future update, as this is still in beta.

Thanks for your response and let’s hope some OpenAI folks will post an update.

1 Like

@sashirestela @sjb14
In case anyone missed, they just rolled out the feature for inspecting the file search results:

https://platform.openai.com/docs/assistants/tools/file-search/improve-file-search-result-relevance-with-chunk-ranking

That’s some great progress and really helps with improving file search. Let’s hope there is more to come.

2 Likes

Oh, boy! When did they sneak that in there? Good find.

I bet that will help reduce Token In for search. I’ll bet further that you’ll be able to use keyword ranking whenever they release the ability.

In the meantime, I have found that using descriptive names for files can significantly impact search.

1 Like

Thanks for the heads up! If you are interested, take a look at the OpenAI Java Client that has been updated with that new feature:

1 Like

I have configured a Assistant with file_search ,I asked this question: ‘Give me 1 problem presented in the documents’

When inspecting the file search results, I see that there is 20 results with rank score varying from 0.48 to 0.07

In the assistant answer, there is one annotation and the reference to the document is a result that had a rank score of 0.09 so does anyone know how does this work? Why did it not use the highest ranking chunk for its answers?

It seems to me that how we see the the search results in the logs is not exactly how the search results get passed to the assistant. It probably does not have access or can’t “see” the score and other information. It just picks which ever one it thinks is most relevant. If you want to exclude results below a certain score threshold, you must set that your self. However, I am not completely sure, so take this with a grain of salt.

1 Like

It is what I have been thinking as well!

I think the Score is actually the Cosine Similarity of the Embeddings for the Chunk that is being searched to the search term(s).

And, yes, if you want to exclude the lower results, you can set a floor at either the Assistant or the Run level.

@afksam It’s searching whatever has any degree of relevance within the bounds you’ve given it. The default number of results for a 4o model is 20, so that’s part of the reason you’re getting so many less-than-great results.

There’s another challenge here, because you can only set a floor for the Score Threshold.

The results I’m working with have an immediate score threshold of >0.7, vs what you’re seeing with a >0.4.

That means we’d have to examine the results before we could safely reduce them programmatically because we can’t set a range, e.g. 0.4 >= score >= 0.7.

Hey Sean, the score cannot be smaller than or equal to 0.4 and bigger than or equal to 0.7 in the same time. Might be a mistyped error.

Did you mean 0.4 <= score <= 0.7 ?

Does anyone have insight into this? In my experience, I believe it creates a slight bias towards the attached store to the thread, but will still utilize the assistant’s store as well.