How to handle file citations with the new Responses API?

axelrothing · March 18, 2025, 7:29pm

Displaying File Citations After Migrating from Assistants API to Responses API

Issue

I’ve recently migrated from the Assistants API (with its file search tool) to the new Responses API with the new file search tool. I’m encountering an issue with handling file citations in the output.

Previous Implementation

With the Assistants API, I could:

Retrieve a file citation object from the assistant’s tools
Use citation.text to access the placeholder for each citation
Replace these placeholders with custom citation formats (numbers, etc.)

Current Behavior

Now with the new Responses API, I’m only receiving:

AnnotationFileCitation(file_id='file-(redacted)', index=3249, type='file_citation', filename='(redacted).pdf')

Questions

How can I properly handle file citations with the new Responses API?
What is the meaning of the index parameter? Does it indicate the starting position of the citation within the text?
Is there a recommended approach for formatting citations when using the file search preview tool?

Any guidance on implementing a citation system with the new API would be greatly appreciated.

stevecoffey · March 24, 2025, 10:28pm

Hey @axelrothing!

Steve here from the OpenAI engineering team! I worked on this feature. The idea here is that instead of having to remove the citations from the text, which is annoying and error-prone, you now have the option of inserting them. You are correct in that index is the place in text where the citation should go.

I’ll make sure we update our docs to be crisp with some code samples!

axelrothing · March 29, 2025, 1:53pm

Thank you so much for the clarification, and it’s great that you are updating the docs!

abhay.saini · April 4, 2025, 6:43am

Hi,
@stevecoffey
You mentioned that
" You are correct in that index is the place in text where the citation should go."
I got this as my response

########################
for i in response.output[1].content[0].annotations:
print(i)
AnnotationFileCitation(file_id=‘file-GXzGKn4oNrRLmA372x4haU’, index=343, type=‘file_citation’, filename=‘cancers-13-03501-v3.pdf’)
AnnotationFileCitation(file_id=‘file-GXzGKn4oNrRLmA372x4haU’, index=614, type=‘file_citation’, filename=‘cancers-13-03501-v3.pdf’)
AnnotationFileCitation(file_id=‘file-GXzGKn4oNrRLmA372x4haU’, index=722, type=‘file_citation’, filename=‘cancers-13-03501-v3.pdf’)
########################

But, I have 20 results in the response.
How am i supposed to extract indexes 343,616 and 723 from those 20 results?
For reference, the results look like this

##################################
Result(attributes={}, file_id=‘file-GXzGKn4oNrRLmA372x4haU’, filename=‘cancers-13-03501-v3.pdf’, score=0.256540214459889, text=‘Because the natural history of cfDNA-detectable cancer cases is not\nestablished, we explored a range of plausible assumptions for the distribution of dwell\ntimes (i.e., how much time each cancer spends in a given stage) per cancer per stage [38].\nThe test performance was estimated based on results from the second substudy of CCGA\n(overall sensitivity 55% across > 50 cancer types with a single, fixed false positive rate of\n<1%) [24,39]. It was assumed that up to 10% of participants might not be analyzable due to\nclinical or assay evaluability criteria.\n\nUnder the most conservative dwell time scenario, which is expected to result in\nthe fewest number of cancers in preclinical states, the median (95% confidence interval\n[CI]) “signal detected” test results were estimated to be 106 (87–128) among analyzable\nparticipants. The number of cancers detected through diagnostic evaluation following a\n“signal detected” test result was estimated to be 52 (95% CI, 39–67). Thus, overall PPV for\ncancer detection was estimated to be 49% (95% CI, 39%–58%).\n\n\n\nCancers 2021, 13, 3501 8 of 11\n\n4. Discussion\n\nAlthough several novel cfDNA-based multi-cancer detection strategies have demon-\nstrated the analytical performance needed for population-scale early cancer detec-\ntion [20,24,40,41], assessment of clinical implementation is limited. The feasibility and\nsafety of MCED tests to be integrated into existing cancer diagnostic workflows and\nto complement existing guideline-recommended cancer screening tests has been ad-\ndressed in only one prior study, DETECT-A [22]. In this study, 9911 women received\nthe multi-analyte CancerSEEK test at baseline. In 490 women with abnormal results,\nan independent test for the abnormal biomarker responsible for the CancerSEEK test\nresult was performed to confirm the result as well as white blood cell (WBC) DNA\nsequencing to rule out clonal hematopoiesis of indeterminate potential (CHIP). A Mul-\ntidisciplinary Review Committee reviewed the medical history of 134 women with\nconfirmed positive CancerSEEK results not ruled out by non-cancer-related causes\nand recommended 127 to undergo full-body positron emission tomography-computed\ntomography (PET-CT) for further confirmation and tumor localization. Findings from\nDETECT-A demonstrated that a blood-based multi-cancer test accompanied by PET-CT\ncan be safely integrated into existing cancer diagnostic pathways without affecting\nadherence to guideline-recommended mammography screening. Newer generations\nof the CancerSEEK test do not require a confirmatory test to rule out mutations due to\nCHIP [22].\n\nFinal study results are not yet available for PATHFINDER; however, the PATHFINDER\nstudy design allows for an initial characterization of clinical use. PATHFINDER will\nassess the ability of an MCED test to prospectively detect cancer and will shed light on\ndiagnostic yield (i.e., the absolute number of cancers detected) and efforts required to obtain\npathologic confirmation of invasive cancer. Notably, the MCED test for PATHFINDER does\nnot require WBC sequencing, uses a locked and validated assay, and predicts CSO to direct\nthe confirmatory diagnostic workup. PATHFINDER will describe the diagnostic pathway\nfor all patients with a “signal detected” result. This will help determine the number of tests\nand procedures and the amount of time required to work up each positive MCED test to\nachieve diagnostic resolution.’)
##################################################

Topic		Replies	Views
How do you find the used bit of text with file search? API	2	130	March 25, 2025
Mapping assistants API annotations back to the location in the source file API assistants , assistants-api	5	2906	September 20, 2024
What to do with Generated Citations? API assistants-api	9	1887	February 25, 2025
Assistant API - Problems with file citation annotations Bugs assistants-api	11	4360	October 20, 2024
How can I access file_citation? API rag	4	4551	January 20, 2024