How can I access the specific text of the file that the annotation is referencing?

shulman.aj · April 24, 2024, 5:19pm

I’m trying to create a feature where an annotation in a response from an assistant (with file-search enabled) can be clicked and the specific text that it references from a PDF will be highlighted. However, although the documentation says that there is a “quote” field in the annotation this field never shows up:

Any sort of indication of where the annotation is referencing in the file (e.g. a text offset, quote, etc.) would be helpful, but there doesn’t seem to be any way to find any meaningful subsection of a while to which the assistant is referring in their annotations.

_j · April 24, 2024, 8:27pm

The AI of assistants v2 no longer has a method to “mark” text for an annotation. It can only refer to a file ID.

shulman.aj · April 24, 2024, 8:37pm

Is there any workaround to get annotations with direct quotes?

_j · April 24, 2024, 8:45pm

You could write your own vector database API tool and instructions that replicate the selection and “marking” behavior, where the files browser had received back its information with line numbers. A tool “line_number_range_of_document_ID_to_offer_as_documentation_download_in_user_interface_before_response_to_user”?

The v2 AI no longer can write the same output that the API backend then parses into annotations, giving ranges of text.

nikunj · April 25, 2024, 12:37am

That’s right – we don’t have support for quotes from the file at the moment. We’ll work on adding support for this!

N2U · May 4, 2024, 3:49pm

A post was split to a new topic: Can Someone Guide Me on Adding Pricing to My Website?

dahifi · June 11, 2024, 7:49pm

Is there an open issue for this? I found this on in the OpenAPI repo, quote is non-nullable and all responses will fail client response validation.

echapeta · June 29, 2024, 3:05pm

Please, is very important, we have no other way because we have no access to the vector store !!!

aaron.lutz · July 19, 2024, 9:50am

Hello,

Are there any updates on the progress for adding support for this?

Thanks!

hinrichs · July 23, 2024, 9:49pm

Is there any update on this?

As of now, the ability to see the quoted text is a critical feature. Currently GPT often returns multiple citations to the same file. Without the missing referenced text they are redundant, and our users are asking why there are multiple redundant citations to the same file. More importantly, most of the value of a search tool is in finding relevant information, not just relevant documents.

My team is now actively developing non-OpenAI alternatives, which we will switch over to as soon as we have a working solution.

OPSllc · July 24, 2024, 2:08am

I second this.

The removal of the quoted text feature from OpenAI’s API has significantly hindered the technology’s effectiveness. This functionality previously provided verifiable sources, making it easier for users to identify and eliminate inaccuracies. Without it end users must weed through large amounts of information making it harder to detect hallucinations. This is especially important when detection of hallucinations is critical to the scientific process. Please restore it.

aaron.lutz · August 19, 2024, 10:07am

Come on guys… Still no update? It’s been almost 4 months since this post, stating that you will work on it. Really, any update on the progress would be very appreciated.
Cheers.

andres_santos · September 21, 2024, 2:33pm

Any updates on this matter? The ‘quote’ information is a critical part of the file_search functionality, and its absence makes the feature unreliable. This should be a top priority.

eannamorley · September 21, 2024, 2:56pm

They did recently add the ability to view the results of the assistant’s search i.e. the full chunks returned from the vector store. See here.

This is still far from satisfactory though as the chunks are often far too big to use as citations and it’s also not always necessarily the case that the top ranked chunk will be where the assistant ultimately draws its answer from.

A direct quote or quotes is pretty essential for any RAG application. Hopefully they add this soon.

aaron.lutz · September 23, 2024, 7:17am

Yes, I saw this as well and it is a very welcome improvement! It still is kind of ambiguous since it seems the search results that we can inspect in the logs now do not match the results that get passed to the model exactly, for example, the model does not know about the file names etc. Let’s hope they will prioritize the File Search / API a little more. I’m looking forward to dev day, let’s hope they showcase some new stuff there.

nihir · September 27, 2024, 4:31pm

I’m facing this issue too. I’m going to solve it with the following approach:

Gather the cited chunks and answer. Cited chunks have the file_id and the file_name in their response.
Feed these into GPT and ask “extract the sub-strings and the file_name this answer has been conditioned on”. Now you should get a searchable string and the file_name to search the document with

Unfortunately has the drawback of basically doubling the compute cost, since you have to feed in the chunks as the prompt twice

aukinfo · September 27, 2024, 9:46pm

Let us know how u get on. Maybe 4o-mini coild do that. U coild do a structured response too.

aaron.lutz · September 30, 2024, 8:44am

Yeah, I tried something kind of similar, although before they added the viewable search results. We use the cited File IDs, retrieve the file names, use Google Drive to find that file with that file name (yes, we uploaded all our RAG Docs also to Google Drive) and then append that to the citations so that the users can view the cited documents in the chat interface via Iframe. Now, with the Search Results you could do something similar: get the cited files, a reference string from the cited chunk, implement a file system for displaying the files, and perform an auto search to “jump” to that part in the file. However, I held out some hope that they will improve this and am looking forward to dev day. If there’s no update, then I guess this will we what we’ll have the revert to.

mambozzo · October 11, 2024, 10:32am

check my response in another tread: Assistant file search text retrieval - #13 by mambozzo

Topic		Replies	Views
Assistant API - Problems with file citation annotations Bugs assistants-api	11	3296	October 20, 2024
Mapping assistants API annotations back to the location in the source file API assistants , assistants-api	5	2408	September 20, 2024
Assistants API "quote" field missing from the "message" object under "file_citation" Bugs assistants-api , assistants-files	14	1098	August 24, 2024
Assistant Citations/Annotations array is always empty API	30	6486	February 21, 2024
Make Annotations Great Again API assistants-api	1	85	September 11, 2024

How can I access the specific text of the file that the annotation is referencing?

Related topics