Mapping assistants API annotations back to the location in the source file

I am using the assistants with retrievals. Does anyone know how to map back the annotations back to the source?
I’m currently trying with a json file but so far the information like the quote or the start and end indices aren’t very useful in locating it in the original json I uploaded.

4 Likes

I’m having the same problem I think. I understand the Assistants API is still in Beta. So, perhaps this needs to be a feature request. I see something like this in the annotations response:

  "annotations": [
    {
      "type": "file_citation",
      "text": "【11†source】",
      "start_index": 613,
      "end_index": 624,
      "file_citation": {
        "file_id": "file-G8tCTIryxew2lZVn3h3GhTpF",
        "quote": "nadal-confirms-a-return-to-tennis-at-the-brisbane-international-20231201-p5eoha.html?ref=rss\",\"description\":\"The 22-time grand slam champion has not played a competitive match since bowing out in the second round of this year’s Australian Open"
      }
    }

Not sure about the OP, but my JSON is an array of objects. So, I would like to get a citation either of the index of the object in the array or an “id” property of the object in the array, e.g.

  "annotations": [
    {
      "type": "file_citation",
      "text": "【11†source】",
      "index": 3
    }

or,

  "annotations": [
    {
      "type": "file_citation",
      "text": "【11†source】",
      "id": "asdf"
    }
2 Likes

Having the same problem. I asked chatGPT4 and it said that the start and end index should relate to the characters extracted from (in my case) a pdf. So before I uploaded the pdf, I went through it page by page and recorded the start and end index for each page based on counting the characters on the page. This did not work… Later when I made a request to the assistant, it would always give back indexes that matched the first page. I would also ask a follow up question: ‘what page did this source come from?’ … that also didn’t work. The answers to that were never quite right. One more thing. The quotes that came back with the annotations seemed that they were sometimes not exact. I tried using the quotes to find the text within the pdf and I could sometimes, but not always.