How can I correlate inline citations with file search results?

There is a problem with the responses API, when using file_search_tool call.
When the response finishes I retrieve the results of the file_search_tool through the event “response.completed” output.type==“file_search_call”. But the results do not link with the inline citations (events “response.output_text_annotation.added”).
How can I link them? the inline citations produced during streaming are not correlated with the outputs of the file search tool.

Here the parameters returned:
“”"
Annotation added: ResponseOutputTextAnnotationAddedEvent(annotation={‘type’: ‘file_citation’, ‘file_id’: ‘file-PxgQXZs21T1ZkSwomwcMMN’, ‘filename’: ‘IOPB167_rev00.pdf’, ‘index’: 2717}, annotation_index=6, content_index=0, item_id=‘msg_68935f081f2c81a3a9fe233c613a7307098e636c9eaaff63’, output_index=1, sequence_number=640, type=‘response.output_text.annotation.added’)
“”"

and

“”"
File search result: Result(attributes={‘machine_name’: ‘eP1030 series 6’}, file_id=‘file-PxgQXZs21T1ZkSwomwcMMN’, filename=‘IOPB167_rev00.pdf’, score=0.9039, text=“Operating Instructions \n\nClass \nINSTALLATION OF 5TH AXIS OPTION \n\nPress Brake (X2) \nGS C4 \n\nAutore/Author Data/Date Approvaz./Approval Data/Date \n\nCodice/Code IOPB167rev.00 Damiano \nGiuriato 30/07/2019 Marco Costa 29/08/2019 \n\nClass: C3: interna/internal only C4: non riservato/unreserved \n\nInstallazione Quinto asse \nINSTALLATION OF 5TH AXIS OPTION \n\nPress Brake (X2) \n\nPrima Industrie Spa 1/13 \n\n5TH AXIS \n\n00 30/07/2019 Issue DG 29/08/2019 \nRev. Date Note Author Approved \n\nThe contents of this document are the intellectual property of PRIMA INDUSTRIE SPA. Apart from the \ncontractually-agreed user rights, any copying or sharing of this document in any way is forbidden without \nthe written authorization of PRIMA INDUSTRIE SPA. \n\n\\italy\data\SharedCV\Documentazione_SGQ\IOPB_Operative_Instruction\IOPB167_Upgrade Quinto Asse\Word\IOPB167_Upgrade Quinto Asse-3.doc Model: \nIOPB_Template \n\n\n\nOperating \nInstructions INSTALLATION 5TH AXIS OPTION Press Brake (X2) \n\n \n\n \n\nPrima Industrie Spa 2/13 \n\\italy\data\SharedCV\Documentazione_SGQ\IOPB_Operative_Instruction\IOPB167_Upgrade Quinto Asse\Word\IOPB167_Upgrade \nQuinto Asse-3.doc Model: IOGB_Template \n\n \n\nIndice/Index \n\n1. SCOPO DEL DOCUMENTO / AIM OF THE DOCUMENT …3 \n\n2. APPLICAZIONE DEL DOCUMENTO / APPLICATION OF THE DOCUMENT …3 \n\n3. RIFERIMENTI / REFERENCES …3 \n\n4. RESPONSABILITÀ / RESPONSIBILITY …3 \n\n5. PROCEDURA / PROCEDURE …3 \n\n \n\n\n\nOperating \nInstructions INSTALLATION 5TH AXIS OPTION Press Brake (X2) \n\n \n\n \n\nPrima Industrie Spa 3/13 \n\n1. SCOPO DEL DOCUMENTO / AIM OF THE DOCUMENT \n\nIT - Lo scopo di questa procedura è descrivere il montaggio dell’opzione 5° asse (asse \nX2) in una macchina eP-Brake. \nEN - The aim of this procedure is to describe the assembly of the 5th axis option (X2 \naxis) in an eP-Brake machine. \n \n\n2. APPLICAZIONE DEL DOCUMENTO / APPLICATION OF THE DOCUMENT \n\nIT - A qualsiasi macchina eP-Brake con Back Gauge (non con Front Gauge). \nEN - Any eP-Brake machine with Back Gauge (not Front Gauge). \n \n\n3. RIFERIMENTI / REFERENCES \n\nIT - Fare riferimento al manuale macchina per gli schemi meccanici e i part number. \nEN - Refer to the machine’s manual for mechanical drawings and part numbers. \n \n\n4. RESPONSABILITÀ / RESPONSIBILITY \n\nIT - È responsabilità del tecnico di applicare tutti i passaggi e di svolgere un collaudo \ncorretto dell’opzione. \nEN - It is the technician’s responsibility to follow all the steps and test the option \ncorrectly.”)
“”"

1 Like

I’m having the exact same issue, except on the assistants API, using the azure python SDK.

I request the file search contents on my stream() call with the include field, and it does work and give me the chunks of the original text in a thread.run.step.completed event.
But it also includes other, irrelevant chunks not used in the annotations annotations, with no apparent mapping possible between the results and the annotations. I even checked the numbers in the weird 【n:m†source】 text that gets returned, and nothing.

I know this topic is about the responses api but it’s basically the same thing, I’m guessing it’s linked somehow.

Previously I used Assistants API from openai-python. I switched because assistants API will be deprecated in 2026.
When I used Assistants I managed to be able to do what I am not able now with Responses API. Here some code:

elif isinstance(event, ThreadMessageCompleted):
    id_message_openai = event.data.id
    message = event.data.content[0].text.value
    if use_citations:
        # Extract citations and assign placeholders
        placeholders = []
        annotations = event.data.content[0].text.annotations

        previous_end = 0

        for i, annotation in enumerate(annotations):
            start = annotation.start_index
            end = annotation.end_index

            # Check if this marker is consecutive to the previous one
            if start == previous_end:
                # If consecutive, assign the same text segment as the previous annotation
                annotation.associated_text = annotations[i - 1].associated_text
            else:
                # Extract text between the previous segment end and the current start
                associated_text = message[previous_end:start].strip()
                annotation.associated_text = associated_text

            # Update previous_end for the next iteration
            previous_end = end

I am facing the same problem right now :frowning:

thanks, that’s true, but the problem occurs when there are multiple citations to the same file. In my project, I must know the piece of text related to a specific annotation, to make the user see the original reference text

The annotation now uses a positional index in Responses. They are stripped out completely by the API backend, instead of being something that you need to write a detector for (otherwise you get strange text).

The index is the code point length in the response where you can re-insert whatever type of link you want to the resource in the assistant output text.

I think that’s not the point.
here the problem explained again:
”””
Problem with streaming with responses API using file_search tool:
In my applications I need to do RAG (so, file_search tool) and cite the piece of text inline (so, in the middle of the generated text). I use the output file_search results that contain the text and the “response.output_text.annotation.added” event to see the position. BUT: these two things are not correlated! How can I know if a output text is from a specific inline citation? They share only the filename/file_id, but there can be multiple citations to the same file. Is it a lack in the responses API or am I missing something?
”””