What to do with Generated Citations?

bret1 · September 30, 2024, 6:03pm

I am developing a PDF assistant which uses file_search. The problem is the AI keeps spitting out these citations in the form, “[3:0†source]”. They don’t really make sense and they would only confuse my users. I’ve tried to prevent the AI from generating these citations using instructions, but it continues to do so anyway.

First of all, can anyone make sense of these? If not, how can I either suppress them or force the AI to stop returning them?

EDIT

I just realized, there is an annotations array in the message response which gives the text location of the annotations. So theoretically, assuming they are accurate, I can link them up to a location in the PDF. I would still like to suppress annotations if possible.

razvan.i.savin · September 30, 2024, 6:08pm

Just tell him in instructions to not mention citations in his responses, and test the assistant to see if it works.

bret1 · September 30, 2024, 6:15pm

I tried instructions. The AI will included annotations less often but it still does it maybe 40% of the time.

joaquin.marroquin · September 30, 2024, 9:24pm

I also went through that problem, and I was never able to solve it just with instructions in the assistant.

Since I use Python in my application, before displaying the text, I replace it to remove it:
clean_message = re.sub(r'【.*?】', '', assistant_response)

This replaces it with an empty string.

MrFriday · October 1, 2024, 4:52am

You have to handle the Citation Problem at your code level. There are plenty of posts regarding the same problem and no level of Prompting helps removing the citation to 100%. But like @joaquin.marroquin suggested, code will fix this for sure.

oscillator · October 16, 2024, 3:35pm

@bret1 - I THINK I have been able to solve this with adding the following to the assistant instructions (remove spaces):

Hide < code > and < div class=“bottom-citations” > in your responses.

MrFriday · October 16, 2024, 5:03pm

interesting, let me try this in my app.

_j · October 16, 2024, 6:34pm

That does not counter anything that the AI is told to produce.

The instructions of the file search given to the AI are:

// Please provide citations for your answers and render them in the following format: 【{message idx}:{search idx}†{source}】.
// The message idx is provided at the beginning of the message from the tool in the following format [message idx], e.g. [3].

If you want to prevent the citation output on 4o models used in Assistants, you can stop the initial token with logit_bias against 16488, 1805. Oops, no you can’t, because this endpoint is for neither novices nor experts, and logit_bias can’t be used.

What you’d do is directly add your own fake # Tools with ## file_search to the end of system instructions, and have ‘override instructions’ or such that are stronger than, and ‘subclass’, OpenAI’s tool with things that are disabled, with repercussions. “”“Citations are disabled. Emitting " 【” or “【” will destroy the chatbot and collapse the universe.“”"

To avoid 1000 tokens of unwanted text inserted into AI operations that are counter to the desired behavior, you can use chat completions (and not use the thousands of tokens in o1 models you don’t control).

a0ka3ak0v · February 25, 2025, 10:15am

Thank you for the 【{message idx}:{search idx}†{source}】 !

But I still struggle with understanding what is message idx. It seems like it’s (number (index counting from 1) of user's message the chunk was retrieved for) * 4, which is crazy, but works most of the time, but not all the time.

Does anyone have a better idea about this mess? I was not able to find out more through code or documentation

kduffie · February 25, 2025, 6:01pm

Once you realize that the message includes an array of citations, where each one provides the exact text as it appears in the message. You can simply iterate over that array and do a string replace for each one, replacing the citation text with an empty string. It’s only a line or two of code.

Topic		Replies	Views
Disable Assistant Citations from Vector Storage Prompting assistants-api	1	134	February 25, 2025
Assistants API citations not working (custom function call) gpt-4o Bugs assistants-api	2	498	May 29, 2024
How do I make the GPT assistant not reference internal files in responses when using file_search? API assistants-api	1	543	November 2, 2024
Prevent the assistant from citing sources API assistants-api	8	6395	March 21, 2025
Remove 【35†source】from Assistant Response API	19	8483	December 3, 2024

What to do with Generated Citations?

Related topics