Unexpected citation markers appearing in text output when using File Search

Hi community,

I’m building an application using the OpenAI Agent SDK with the GPT-5 model.
When the agent has the OpenAI native File Search tool enabled, I’ve noticed that the response output text sometimes contains unexpected citation markers at the end — for example:

fileciteturn0file2turn0file6

This happens when I read the text from the text_message_output event.

Is this the intended behavior for GPT-5 when using File Search?
If not, is there any way to completely disable these citation markers?

I’ve tried adding rules in the agent’s instructions to suppress them, which seems to help occasionally, but not consistently. Am I missing something in the setup or configuration?

Thanks in advance for any insights!

2 Likes

What I captured before by having the AI escape it:

image

What GPT-5 has now, plus your paste so it can be seen:

(I added linefeeds)

The file_search tool internally has language instructing the AI to produce these annotations to cite the chunk.

On the Responses API, they are supposed to be recognized by the backend within the AI-produced language, parsed and stripped out, and then given to you as a text index position corresponding to the original location where they were in the output text, as “annotations”.

Either:

  • the AI is writing them slightly wrong (if varied and occasional) such as two-in-one
  • the backend is not catching them (if always the same)
  • agents SDK is consuming its own version of the API, not doing what should be done.
  • gpt-5 is a random token factory

If you would not want these, you cannot change the tool message, nor really speak with the authority of “system” that places internal tools.

I would write your own parser/stripper until the fault is recognized and repaired.

2 Likes

Hey, got literaly the same error citeturn1view0turn4view0 but I am not using file tool by the web search tool and I’m using Vercel AI SDK with GPT-5.

1 Like

@OpenAI_Support
Can you help or share any insights into this? Is there any recommended way to skip these citation markers in GPT-5 models?
Thank you!

I am seeing the same issue using GPT-5 nano over the responses API, when using the file search tool. I’ve also tried to handle this through system prompt rules, but the issue is still occurring occasionally. @OpenAI_Support what can you recommend to prevent this issue?

The AI must close the citation with the correct character.

OpenAI is using the multibyte private-use Unicode code point character seen earlier. This is extremely poor tool language and code that you can’t change except to point out how dumb it was as a choice by OpenAI. And use your own functions.

This is an unlikely prediction, as it is not going to be trained and will have little meaning. That is likely why the AI will not close a citation, but instead outputs the middle character again, then a pattern of writing another citation index.

You must use the Unicode utf-8 character yourself:
s = '\ue201'

Then reinforce over and over in system prompts how to write and how to finish and close a citation.

Maybe if the AI sees it a dozen times, it will be able to repeat the correct citation character. Or you use ‘nano’, you are talking to a model that can barely write in human language.