Mapping assistants API annotations back to the location in the source file

I am using the assistants with retrievals. Does anyone know how to map back the annotations back to the source?
I’m currently trying with a json file but so far the information like the quote or the start and end indices aren’t very useful in locating it in the original json I uploaded.

4 Likes

I’m having the same problem I think. I understand the Assistants API is still in Beta. So, perhaps this needs to be a feature request. I see something like this in the annotations response:

  "annotations": [
    {
      "type": "file_citation",
      "text": "【11†source】",
      "start_index": 613,
      "end_index": 624,
      "file_citation": {
        "file_id": "file-G8tCTIryxew2lZVn3h3GhTpF",
        "quote": "nadal-confirms-a-return-to-tennis-at-the-brisbane-international-20231201-p5eoha.html?ref=rss\",\"description\":\"The 22-time grand slam champion has not played a competitive match since bowing out in the second round of this year’s Australian Open"
      }
    }

Not sure about the OP, but my JSON is an array of objects. So, I would like to get a citation either of the index of the object in the array or an “id” property of the object in the array, e.g.

  "annotations": [
    {
      "type": "file_citation",
      "text": "【11†source】",
      "index": 3
    }

or,

  "annotations": [
    {
      "type": "file_citation",
      "text": "【11†source】",
      "id": "asdf"
    }
2 Likes

Having the same problem. I asked chatGPT4 and it said that the start and end index should relate to the characters extracted from (in my case) a pdf. So before I uploaded the pdf, I went through it page by page and recorded the start and end index for each page based on counting the characters on the page. This did not work… Later when I made a request to the assistant, it would always give back indexes that matched the first page. I would also ask a follow up question: ‘what page did this source come from?’ … that also didn’t work. The answers to that were never quite right. One more thing. The quotes that came back with the annotations seemed that they were sometimes not exact. I tried using the quotes to find the text within the pdf and I could sometimes, but not always.

Have you got this to work yet? I was trying to retrieve the exact text using the annotations as well and I came to the conclusion that the start and end index refer to the response text part(not the original file) where annotations were used to answer

I can’t even get an API search to see files?

Would you be against sharing what you’re doing to make this work?

I have:

  1. Created an Assistant in Playground with a Vector Store that has 1 File.
  2. Via the API, created a Thread and attached a Vector Store with 1 File
  3. Create a Run and asked “What files can you see?”
    I’ve tried various combinations of providing tools and tool_resources params but no matter what I try I cannot get the Run to return a result with any other messages than “Can’t find any files”.