Assistant API always return empty annotations

I found a series of steps that actually resolves this issue for me. The annotations array is no longer empty.

  1. Delete all files from your assistant.
  2. Delete the same files from the Files tab (removes them from the Files repo).
  3. Disable the Retrieval tool.
  4. Save the Assistant.
  5. Re-enable retrieval, and re-upload one file.
  6. Save the Assistant again.

After I performed the above steps, I started receiving data in the annotations array. I started by only uploading one file because I had seen reports that multiple files might be causing the issue. I’ll report back here and update this answer after I upload multiple files if the annotations array once again becomes empty. If that’s the case, I may combine all of my data into one file, as that will work for this particular use case.

As a side note, I also started using the gpt-4-1106-preview model, instead of 3.5-turbo, since it also resolved some issues we saw with functions getting called unnecessarily. Not sure if that had an impact, or not.

I also found another response in the forum that lists an example annotations object, if you need to see one.

Update: Indeed, as soon as I add more than one document, the annotations array becomes empty, again.

2 Likes

We have seen a notable increase in annotations with the right prompting & a more complicated sourcing technique (below). Always asking for sources and annotations on those sources. We’re currently rewriting initial user questions to capture additional details, and request sources be added.

Separately, since 14/12, and after coincidentally doing the same (cleaning up all unneeded files) annotations seem to come back more reliably.

Finally, to summarize, the only real strategies to utilize in the short term to make annotations within Assistants work well are:

  • You can set up a pre-assistant workflow that lists the files you have by name, asks a separate gpt-3.5 instance about which documents should be used for the question, and un-link from the Assistant any files that you think you really don’t need for this thread/interaction. This is the only way that we have notably increased performance, but it is slow and prone to errors when documents don’t have much metadata or bad titles.
  • As mentioned, explicitly request this on the user prompt. Include mentions of sources, annotations and retrieval in the Assistant instructions.

For ease of conversation, I will continue to post updates to Assistant Citations/Annotations array is always empty - #22 by jorgeintegrait

Please share other ideas and techniques that work for you, we’ll be using other architectures for production for the time being, but will follow the progress of this limitation until we can reliably use it for our clients.

Thanks,

2 Likes

It might be related to the version of the (Python) API library you are using.
Using the API through a google cloud function I also got empty annotation arrays. In my test environment, however, I consistently got file annotation. By setting the version used in the google cloud function to the same as my test environment, it works:

Name: openai
Version: 1.3.9
1 Like

The version-fix does work most of the time. However, when there are several quotes provided, the annotation arrays remains empty. For example, 1 or 2 quotes works fine, but 8 or 9 quotes does not work. I’m trying to force the assistant to provide a limited number of quotes, but then it either ignores the instruction or limits itself to a single quote…

1 Like

I am experiencing this as well.【9†source】 appeared when I was using Chinese dialogue, so I used regular expressions to replace the garbled characters with empty strings, which temporarily solved the problem. However, if the uploaded file contains 【】, it will cause an error in the normal return content.

1 Like

@nikunj This problem is still unaddressed, two months later. The Assistant is still returning source tags that have no corresponding annotations.

In the examples below, I have redacted (with [...redacted...]) parts of the content to protect my company’s confidential information, but you can use the thread_id and message_id to locate this data internally to help you debug.

In a recent case, the source tag 【7†source】 appeared in the output and the annotations field was empty, like this:

{
  "assistant_id": "asst_o5CHcKj8uZSNg567sjRs2E6B",
  "content": [
    {
      "text": {
        "annotations": [],
        "value": "[...redacted...] \u30107\u2020source\u3011."
      },
      "type": "text"
    }
  ],
  "created_at": 1704743170,
  "file_ids": [],
  "id": "msg_XREhks8a35w2OQ5UmMPZ1orn",
  "metadata": {},
  "object": "thread.message",
  "role": "assistant",
  "run_id": "run_wt8cYaz9IU3mdatBY1jfjnc2",
  "thread_id": "thread_6ajZAX71o06kh6DUtsAS622x"
}

In another case, several source tags appeared in the output, but the annotations field only contained annotations for some of them and not others, like this. You can see that there are source tags 【9†source】【10†source】【11†source】【12†source】【13†source】in the content, but no annotation for 【13†source】. Also, two different annotations appear for 【10†source】, with the same quote but different start_index and end_index:

{
  "assistant_id": "asst_o5CHcKj8uZSNg567sjRs2E6B",
  "content": [
    {
      "text": {
        "annotations": [
          {
            "end_index": 377,
            "file_citation": {
              "file_id": "file-WDCeoz9qzoqqYkudP4InRiNc",
              "quote": "Simulating [...redacted...] code"
            },
            "start_index": 366,
            "text": "\u301011\u2020source\u3011",
            "type": "file_citation"
          },
          {
            "end_index": 751,
            "file_citation": {
              "file_id": "file-WDCeoz9qzoqqYkudP4InRiNc",
              "quote": "Introduction [...redacted...] problem"
            },
            "start_index": 740,
            "text": "\u301010\u2020source\u3011",
            "type": "file_citation"
          },
          {
            "end_index": 1065,
            "file_citation": {
              "file_id": "file-WDCeoz9qzoqqYkudP4InRiNc",
              "quote": "What's [...redacted...] n"
            },
            "start_index": 1054,
            "text": "\u301012\u2020source\u3011",
            "type": "file_citation"
          },
          {
            "end_index": 1625,
            "file_citation": {
              "file_id": "file-WDCeoz9qzoqqYkudP4InRiNc",
              "quote": "Post [...redacted...] option"
            },
            "start_index": 1615,
            "text": "\u30109\u2020source\u3011",
            "type": "file_citation"
          },
          {
            "end_index": 2237,
            "file_citation": {
              "file_id": "file-WDCeoz9qzoqqYkudP4InRiNc",
              "quote": "Introduction [...redacted...] problem"
            },
            "start_index": 2226,
            "text": "\u301010\u2020source\u3011",
            "type": "file_citation"
          }
        ],
        "value": "To test [...redacted...] campaigns\u301011\u2020source\u3011.\n\n [...redacted...] group\u301010\u2020source\u3011.\n\n [...redacted...]  code\u301012\u2020source\u3011.\n\n [...redacted...] option\u301013\u2020source\u3011.\n\n [...redacted...] donations\u30109\u2020source\u3011. \n\n [...redacted...] done\u301010\u2020source\u3011. [...redacted...] methodologies."
      },
      "type": "text"
    }
  ],
  "created_at": 1704743379,
  "file_ids": [],
  "id": "msg_68nMqZJVUMPu6UD3Q0H0Wph4",
  "metadata": {},
  "object": "thread.message",
  "role": "assistant",
  "run_id": "run_7A0ICERIl9SWBca1OgbIDidy",
  "thread_id": "thread_nrrXCqnFZEfJDxVngiuyelKV"
}

This looks pretty wrong to me. Please investigate this problem and let us know what you find. Thank you!

2 Likes

Still doesn’t work consistently. I need max two annotations per output and have only uploaded 2 documents.

2 Likes

this is still an issue on January 25.
We have a single file but sometimes we get the text citation 【0†source】but the annotations array is empty.

3 Likes

Jumping in to say that I’m also experiencing this issue for our clients and would be glad if it was resolved!【29†source】

2 Likes

Same here…no way to get them. I found a post of a guy explaining very challenging approach which not sure we can all applied. Counting down the bug fix asap :slight_smile:

3 Likes

The issue persists on ‘1.12.0’ too. I even did what people suggested and deleted the assistant, deleted the files, and uploaded the files again.

All of these issues seem to be resolved Today.

1 Like

Thanks for the heads up @philip3. That would be grand. Making some tests with the same setups as we discussed above and report back.

Re-Testing Annotation Functionality:

Note: All tests were performed in both the Playground and through the API.

Newly created assistant with one or multiple files always returns annotations properly, but in some multiple file cases the correct file is referenced and no source text is displayed in the annotation just the file_name.

Previously created assistants with new or already uploaded files results in the same as above.

It appears annotations are working 75% of the way now. The OpenAI Documentation giving an example on how to translate annotation is also functioning as expected now in every case tested.

I’m working with Assitant for a while and I always found annotations field empty.

Playing with the prompt and the instructions, I noticed that if you ask the references also in the prompt (and not only in the instructions), it tends to return the list of references also in the annotations, but not always.

Many times it outputs in the response something like this 【41†source】 but with empty annotations. I’m trying with the last available version of openai (1.12.0).

Hope that will be fixed because it appears as a simple bug, since it’s visible that the assistant is retrieving the context, because it inserts the source with a number, but simply doesn’t return the “mapping” of that source.

It is working much much better.

The retrieval is still a black box, and it doesn’t always work as expected. But the quoting is working quite reliably now.

It seems to consume a massive amount of tokens compared to custom rag setups we’ve tried, it either tries to read all files or complete short files into context. This creates a common issue where non-ideal documents are used to answer if keyword/semantic matches are found (we don’t know how it works in detail).

There are still occasional empty annotations, so make sure your code cleans the output up nonetheless.

PS. Not sure why, but you need a new assistant to notice the improvements.

2 Likes

Indeed now that i created a new assistant object (with the exact same files setup as before), i am finally seeing annotations populated.

Hello,

We are still receiving empty “quote” property (or, now with V2 just not getting this property). We have had 2 days in the last month when we got this info, so technically it is returning it sometimes (last time was this Monday for couple hours. After night the same assistant stopped giving quotes).

Seems that most people are not facing this issue anymore (at least the activity is down), but my team and I have spent more than a week only on this issue trying all models, creating new assistants, uploading new files, with V2 release yesterday creating new vector stores, again new assistants. Pretty much everything, but we are getting empty quote on V1 and no quote on V2 API.

Here is the response for one of the tests where we have added Wiki page about Tigers, so size and token limitations dont apply. At one of the tests we managed Assistant to give us exact quote in text from the file, but not in annotation.

 ""content"": [
        {
          ""type"": ""text"",
          ""text"": {
            ""value"": ""The Latin name of the tiger is *Panthera tigris*【4:1†source】."",
            ""annotations"": [
              {
                ""type"": ""file_citation"",
                ""text"": ""【4:1†source】"",
                ""start_index"": 48,
                ""end_index"": 60,
                ""file_citation"": {
                  ""file_id"": ""file-jasvwEwCEmvFkWANq33fMBXR""
                }
              }
            ]
          }
        }
      ]

The quote should be there based on message object found here - https://platform.openai.com/docs/api-reference/messages/object

We are all out of ideas at this moment. Any help would be appreciated, we are up to trying anything (except adding additional tools like LangChain, as OpenAI is posting this as a end to end which we intend to use).

*We are using both APIs and Playground for testing. Here is Playground:
image

For us citations have been working steadily here you have an example of 3 minutes ago

image

Our real problem is that is not using the vector/files by default so we need to force the prompt to start using the attachment otherwise it defaults on the general knowledge instead of the RAG.

We are using v2 btw and we got the same results from playground as on the API, in v1 the file was always first used and defaulting into the LLM when no match found.

hmm, but I`m also not seeing any quote on your annotation. We are getting references (same as you, when force asking it mostly, but at least something), but it is pretty much always without a quote. We need to get the embedding with the annotation, as written in the documentation.

This is what we got at one point and pretty much what we want to get - not only file reference, but also the file part for that reference:

*a picture from internal communication a while back