What are 【N†source】in OpenAI's Assistant API response?

I’m using OpenAI’s assistant API with Retrieval tool and sometimes getting weird looking chunks, which look like this:

... came from London.【25†source】Many days passed since then...

What does 【25†source】 stand for? It might be a reference for a source document, but the number 25 in the above example doesn’t make sense. I uploaded one docx document containing 2 pages, so it is not a page number. What is this and how to interpret/use it?

2 Likes

The prevailing theory is they chunked the uploaded files for retrieval and these numbers represent the number of chunks retrieved. Try to do some query in the playground and check the logs. The number of times the files are included across the requests will correspond to this number. But all these are theories, I don’t know for sure.

2 Likes

Explanation

5 Likes

Great explanation, however it doesn’t seem to work that way. Atleast with retrieval?

Annotions array always seems to be empty for some reason, atleast for me when i’m running an assistant with multiple files. Answer generated is generated from the files, however it only answers with [x†source] ? Any way to fix this?

2 Likes

Hi! I’ve also noticed such cases (e.g.: annotations list is empty, but the answer contains placeholders like [x†source] or sandbox:/mnt/data/file...).

This should be a bug, otherwise I don’t know how to interpret such responses.

3 Likes

I’ve closely evaluated the event-stream response from the myfiles_browser tool that handles RAG, and it isn’t returning the metadata that the quote_lines function uses when quoting context. So those inline citations flat out won’t work.

1 Like

I am also having the same problem. Is there any way to delete this?

2 Likes

Same here, did you find a way?
I’m looking into deleting them post creation

1 Like

Just in case it’s past files you uploaded that you can’t see, ask the GPT helper to “Delete all files right away then verify they are deleted”

If it does that and then you still see that you have invisible files, well, they’re probably a critical part of the system architecture and not something you’re meant to be able to delete.

I did with substring with Javascript:

let finalResponse = await openai.beta.threads.messages.list(run.thread_id);
let responseText = finalResponse.data[0].content[0].text.value;

let index = responseText.indexOf('【');
if (index !== -1) {
    responseText = responseText.substring(0, index) + '.';
}
1 Like

Cool, thanks. I wrote a Python code that works with Regular expressions to achieve it as well.

import re

string = "Sample【25†source】"

regex_pattern = r"【.*?】"

cleaned_string = re.sub(regex_pattern, '', string)

# Result is "Sample"
3 Likes

Often, and for an unknown reason, the text comes back either Empty or with a different string completely.

For these cases we respectively attach at the bottom, and append instead of replace.

I’m having the same issue. Response comes with 【13†source】but annotations array is always empty.
For our use case, it’s really important to know which files have been used to generate the answer.

1 Like

The brackets that are used in the response are called corner brackets. I included the lines below in my assistant instructions and it seems to be working and not including the source:

You should never cite the source of your response. You should never include corner brackets in your response.

2 Likes

This worked like a charm for me! Thanks for sharing :slight_smile:

Glad to hear. I did find that a few times it slipped through, so the most reliable way would be to parse that out from the API response.

For those of us who use javascript, you can make the filter like this:

responseText = responseText.replace(/【\d+:\d+†[a-zA-Z]+】/g, '');
1 Like