On OpenAI Assistants, I’m searching for information which is spread in different parts of files inside a Vector Store. Those files have more than 100 pages.
This is an example of what I usually search: “Find an accident between a car and a horse in 2017, and give me the data of the people involved”
The file is describing the whole accident, but, the NAME of the person is in the page 100 (chunk 150, for the vector store, for example), and the details of the accident (car, horse, 2017) are described in the page 102 (chunk 158, for example).
So, given this, the assistant find all the chunks that are related to accidents between horses and cars, (chunk 5, 158, 300, 500, etc), but it doesn’t retrieve the chunk related to the personal information (150), so, it just hallucinate.
I am thinking that the way of fixing it is: “If you find what you want in chunk X, retrieve chunk X but also X-1, X-2, and X+1, X+2”. That should be enough to find the accident details, and the details of people involved.
Is there a way to configure Assistants to add include those “chunk offset”?
If not… It may be a good new feature.
Assistants currently allow me to set how many chunks should be retrieves. Why not to add an offset there? Check screenshot below
