Looking for a similar solution, but have not found anything promising.
I managed to get some more info on how file search actually works, but unfortunately this is not documented…
Here’s a quick rundown.
The AI model you specified (gpt 3.5, 4, 4o etc) outputs a search query to the search tool. This looks like this:
msearch([“Search Query generated by the Assistant”])
Then the File Search performs a semantic and keyword search to find the most relevant results. It seemed to me, that before the results are passed to the assistant, they get re-ranked or filtered and only the top most relevant results get passed.
The result(s) look like this:
[
{
“message_idx”: 12,
“search_idx”: 0,
“text”: “Text from the file, i.e the search result. This text is exactly as it is in your source document.”,
“source”: “sourcefile.txt”
}
]
Unfortunately, this is not visible in the logs of the run steps or anything similar, at least I could not find it anywhere. But, I think the results above are maybe what you are looking for. I had to do some multi-step prompts to finally get the model to spit out the search results like this. It would be really helpful if Openai would offer some more documentation on this.
I also posted a thread touching on this topic: