What are 【N†source】in OpenAI's Assistant API response?

ali.zhadigerov · November 19, 2023, 10:35pm

I’m using OpenAI’s assistant API with Retrieval tool and sometimes getting weird looking chunks, which look like this:

... came from London.【25†source】Many days passed since then...

What does 【25†source】 stand for? It might be a reference for a source document, but the number 25 in the above example doesn’t make sense. I uploaded one docx document containing 2 pages, so it is not a page number. What is this and how to interpret/use it?

supershaneski · November 19, 2023, 11:28pm

The prevailing theory is they chunked the uploaded files for retrieval and these numbers represent the number of chunks retrieved. Try to do some query in the playground and check the logs. The number of times the files are included across the requests will correspond to this number. But all these are theories, I don’t know for sure.

ali.zhadigerov · November 20, 2023, 2:15am

Explanation

lukasareskog · November 20, 2023, 9:54am

Great explanation, however it doesn’t seem to work that way. Atleast with retrieval?

Annotions array always seems to be empty for some reason, atleast for me when i’m running an assistant with multiple files. Answer generated is generated from the files, however it only answers with [x†source] ? Any way to fix this?

ali.zhadigerov · November 20, 2023, 3:39pm

Hi! I’ve also noticed such cases (e.g.: annotations list is empty, but the answer contains placeholders like [x†source] or sandbox:/mnt/data/file...).

This should be a bug, otherwise I don’t know how to interpret such responses.

spdustin · November 20, 2023, 5:27pm

I’ve closely evaluated the event-stream response from the myfiles_browser tool that handles RAG, and it isn’t returning the metadata that the quote_lines function uses when quoting context. So those inline citations flat out won’t work.

RRRRR · December 2, 2023, 2:11pm

I am also having the same problem. Is there any way to delete this?

rtpenar · December 4, 2023, 6:45pm

Same here, did you find a way?
I’m looking into deleting them post creation

Kaltovar · December 4, 2023, 6:48pm

Just in case it’s past files you uploaded that you can’t see, ask the GPT helper to “Delete all files right away then verify they are deleted”

If it does that and then you still see that you have invisible files, well, they’re probably a critical part of the system architecture and not something you’re meant to be able to delete.

rene.uassistme · December 5, 2023, 11:21pm

I did with substring with Javascript:

let finalResponse = await openai.beta.threads.messages.list(run.thread_id);
let responseText = finalResponse.data[0].content[0].text.value;

let index = responseText.indexOf('【');
if (index !== -1) {
    responseText = responseText.substring(0, index) + '.';
}

rtpenar · December 6, 2023, 4:09am

Cool, thanks. I wrote a Python code that works with Regular expressions to achieve it as well.

import re

string = "Sample【25†source】"

regex_pattern = r"【.*?】"

cleaned_string = re.sub(regex_pattern, '', string)

# Result is "Sample"

jorgeintegrait · December 15, 2023, 6:55pm

Often, and for an unknown reason, the text comes back either Empty or with a different string completely.

For these cases we respectively attach at the bottom, and append instead of replace.

jefferson.reis · January 16, 2024, 8:57pm

I’m having the same issue. Response comes with 【13†source】but annotations array is always empty.
For our use case, it’s really important to know which files have been used to generate the answer.

indextech · June 19, 2024, 7:42pm

The brackets that are used in the response are called corner brackets. I included the lines below in my assistant instructions and it seems to be working and not including the source:

You should never cite the source of your response. You should never include corner brackets in your response.

GreenMediaGroup · June 19, 2024, 9:37pm

This worked like a charm for me! Thanks for sharing

indextech · June 20, 2024, 4:54am

Glad to hear. I did find that a few times it slipped through, so the most reliable way would be to parse that out from the API response.

aledc7 · August 21, 2024, 1:23pm

For those of us who use javascript, you can make the filter like this:

responseText = responseText.replace(/【\d+:\d+†[a-zA-Z]+】/g, '');

Topic		Replies	Views
Assistant it answers with strange suffixes Bugs assistants-api	1	897	January 25, 2024
Overcoming AI Response Issues: Unwanted Codes in Text -【59†source】 Community gpt-4 , assistants	4	1298	February 6, 2024
Assistant API always return empty annotations Bugs	48	7961	June 7, 2024
How to extract source from assistant retrieval tool? API gpt-4 , api	4	2541	November 17, 2024

What are 【N†source】in OpenAI's Assistant API response?

Related topics