Assistant API - Problems with file citation annotations

oren.bajayo · February 6, 2024, 11:47am

I’ve developed a chat client that renders text annotations as instructed on the API documentation. It worked well and I was able to cite files properly until yesterday. At least, this is when I got a bug report about it.
I’m using Python openai client with AsyncOpenAI version 1.9.0. (I upgrade to it from 1.7.1 about a week ago but I cannot relate the bug to this change)
Summary: when citing text from a knowledge file, I’m not getting a file_citation annotation as I used to get. Instead, I’m getting either no annotation at all, or file citation annotation without a file id.
Here’s what I did:
My assistant is simply a developer assistant and I use it to debug the API. It can return “text” response, return function calls, run cod interpreter, and has retrieval enabled. It has a small Knowledge.pdf file from which I ask it to quote text.
Test 1:
Model: gpt-4-1106-preview
Message: “Please return a random response with a quote from your Knowledge.pdf file along with the proper annotation.”

Response:
[MessageContentText(text=Text(annotations=[TextAnnotationFileCitation(end_index=62, file_citation=TextAnnotationFileCitationFileCitation(file_id='', quote='Oren is cool.\n2 - Elisha is cool.\n# 【1†Knowledge.pdf†file-aHHFulAMdwttPsOycx0Vh3ae】\nThis is a test knowledge ﬁle. Sentences to quote: 1 - Oren is cool'), start_index=52, text='【7†source】', type='file_citation')], value='Here isandom quote from the file: "Oren is cool"【7†source】.'), type='text')]

Problems:

Bad annotation format
No file id
Annotation quote is different from the actual quote - it’s all the content of the file.
Typo in response. (but that’s another issue)

Test 2:
Model: gpt-4-0125-preview
Message: “Please return a random response with a quote from your Knowledge.pdf file along with the proper annotation.”
Response:

[MessageContentText(text=Text(annotations=[], value='Random quote from "Knowledge.pdf": "Oren is cool."【5†source】.'), type='text')]

Problems:

No file annotation returned at all.

Am I missing anything or is this really a bug?

jlvanhulst · February 6, 2024, 5:09pm

Did you try a different model (like the 0125 preview?

oren.bajayo · February 6, 2024, 5:30pm

Of course. Look at the two tests I described. The first uses: Model : gpt-4-1106-preview and the second: Model : gpt-4-0125-preview.
Of course I’ve ran this many times but these are simply two samples that constantly reproduce the problem.

jlvanhulst · February 6, 2024, 5:51pm

I have been using things like "Add a chart’ (inside a specifci spot in a message it is also creating) - and noticed that this is not 100% repeatable. (Sometimes I get the image inserted in the correct spot, sometime I just get the file generated as part of the overall message)

In your case I wonder it it gets confused by what it should do vs what it would already do ‘underwater’. Ie creating a quote as a prompt vs ‘i did some RAG in the file to get an answer here is the annotation’. Can it even ‘add an annotation’ I wonder (ie is this something that can be prompted - or is it built in as part of the RAG mechanisms… )

oren.bajayo · February 6, 2024, 6:07pm

The strange thing is that I used this exact setup to develop out annotation rendering which worked as expected, and now it feels as if something changed. The empty file_id on the test 1 is definitely a new issue which seem to be a bug. The response on Test 2 seems like an issue that correlates to what you described.
I have found many more issues with annotations, but managed to create workarounds for most. One issue that I till do not have a solution for, is fetching the content where the quote was taken from. For example, if the quote is of a sentence, it seems like a good idea to provide the end user with the ability to see the “context window” from which it was taken.
I’ve also found many edge cases with dealing with file and image creation (via code interpreter) which tends to be a bit unpredictable and also the Object type of the returned citation is not the same so you need many “if” clauses in order to make the code safe.

jlvanhulst · February 6, 2024, 7:48pm

I think we’ll not be able to solve the quotation problem without additions to the API the help us ‘define’ the window - or request a larger section. We don’t have access the internal RAG data so don’t know there is a way to do it. And since it takes in so much different types of content - it might also not ‘look pretty’ depending on the content. But I can see there being parameters to configure annotations in the future.

Ricky57 · April 5, 2024, 3:10pm

Hey, I am facing the same issue as well. Wondering if the problem has been solved at your end? If so, would love to hear what you suggest to fix this.

Specifically, I followed the documentation as well for replacing file annotations, but it is not working.

crapshot.ai · April 9, 2024, 10:35pm

Same issue at my end … I had a working application that was retrieving file citations as intended, which suddenly stopped working. Looks like some alterations to the API caused this.

colibriosoftware · May 29, 2024, 7:55pm

Since upgrading to API version 2, and using GPT-4o, I never get any quoted text in the file_citation data anymore. The quotes from the uploaded text that are part of the response text are much better with GPT-4o, but for my use case I need the actual query results from the vector database.

PS. Just found this response on the topic. Seems like I’ll have to give up on the Assistant API.
Strange that OpenAI would release this API so half baked. What use is RAG if you can not validate the output?

dahifi · June 11, 2024, 7:50pm

Might want to keep an eye on this gh issue, I’m trying to nail the staff down on where they’re tracking this.

dimonka · July 19, 2024, 4:10pm

Actually the quote names are repeating themselves and I have a feeling that in the “Threads” of the dashboard there is a sort of “AI” (or just post processing) used to make relation to actual citations in the annotation part of response.

Is there any hint on how to use indexes in the annotations? I am particularly interested to get quotes myself by using start_index / end_index if it is possible at all.
Is there any practical experience using those indexes?

AlexMorningStar · October 20, 2024, 3:39am

i am having the same problem, assistant seems to hallucinate completely when asking to provide quotes from text file. It doesn’t seems i have the problem when using the chat version.

Any one would have more information on this topic ?

Topic		Replies	Views
Assistants API "quote" field missing from the "message" object under "file_citation" Bugs assistants-api , assistants-files	14	1575	August 24, 2024
How can I access file_citation? API rag	4	4583	January 20, 2024
Mapping assistants API annotations back to the location in the source file API assistants , assistants-api	5	2932	September 20, 2024
How can I access the specific text of the file that the annotation is referencing? API assistants-api	22	3325	February 11, 2025
Assistant Citations/Annotations array is always empty API	30	7035	February 21, 2024

Assistant API - Problems with file citation annotations

Related topics