My retrieval case has a dominating major category (“candidate”), now with seven values but (months later) maybe hundreds.
Does someone have tips, how should I organize my information so that the model can efficiently limit its search by candidate? My fear is pure embeddings don’t do this well, and there is too much to fit everything into the context.
From the model itself, I understand that with its tool it can do more than embedding search. What it calls “keyword search” is purely embedding-based however, or is it?
A simple option in my case would be to put candidate-specific information into separate files, like candidate_name.txt
, but it seems that the model cannot target single files with its search, so the option of using separate files per candidate is out. Is this correct?
(This is all with the retrieval included with Assistants, of course. Doing search with hard keywords would be easier if I had my own RAG.)