Retrieving Content from Vector Store Based on Index and File ID in File Search Assistant

hussein.a.abusetta · July 4, 2024, 12:32am

Hi everyone,

I am developing a Retrieval-Augmented Generation (RAG) chatbot that answers questions based on documents using the File Search Assistant. Everything is working well, but I need some help with retrieving specific content from the vector store.

In the response, the “annotation” provides a start_index, end_index, and file_id. Is there a way to use these values to retrieve the exact content from the vector store?

Any advice or examples on how to achieve this would be greatly appreciated!

_j · July 4, 2024, 4:42am

Nope, zero documentation at all about possibilities in the message object returned, and no way to inspect document extraction chunks except to twist the AI into returning one as language.

class FileCitation(BaseModel):
    file_id: str
    """The ID of the specific File the citation is from."""


class FileCitationAnnotation(BaseModel):
    end_index: int

    file_citation: FileCitation

    start_index: int

    text: str
    """The text in the message content that needs to be replaced."""

    type: Literal["file_citation"]
    """Always `file_citation`."""

hussein.a.abusetta · July 4, 2024, 2:07pm

Thank you for your response. I’m a bit confused by your statement: “twist the AI into returning one as language.”

Could you please elaborate on what you mean by this? Specifically, how can I twist the AI to return the document extraction chunks as language?

_j · July 4, 2024, 4:05pm

Twist: Warp the AI’s understanding to do what it doesn’t want to do with imposition of a different role or language than might be expected. Lie about the situation. Create a jailbreak. Reweight token calculations.

The default behavior is to avoid dumping out a developer’s documents verbatim (and there is limited output to do so anyway, vs how much is loaded into memory.) So to perform diagnostics on the quality of the extraction or the similarity results, you have to ask “nicely”.

Topic		Replies	Views
How do you find the used bit of text with file search? API	2	129	March 25, 2025
Chat with one file in a multi file vector store or combine vector stores API	4	46	May 17, 2025
Mapping assistants API annotations back to the location in the source file API assistants , assistants-api	5	2901	September 20, 2024
Assistant file search text retrieval API assistants-api	25	4824	March 18, 2025
Using threads vs chat completions API	4	2724	May 15, 2024

Retrieving Content from Vector Store Based on Index and File ID in File Search Assistant

Related topics