I’m using OpenAI’s assistant API with Retrieval tool and sometimes getting weird looking chunks, which look like this:
... came from London.【25†source】Many days passed since then...
What does 【25†source】 stand for? It might be a reference for a source document, but the number 25 in the above example doesn’t make sense. I uploaded one docx document containing 2 pages, so it is not a page number. What is this and how to interpret/use it?
The prevailing theory is they chunked the uploaded files for retrieval and these numbers represent the number of chunks retrieved. Try to do some query in the playground and check the logs. The number of times the files are included across the requests will correspond to this number. But all these are theories, I don’t know for sure.
Great explanation, however it doesn’t seem to work that way. Atleast with retrieval?
Annotions array always seems to be empty for some reason, atleast for me when i’m running an assistant with multiple files. Answer generated is generated from the files, however it only answers with [x†source] ? Any way to fix this?
I’ve closely evaluated the event-stream response from the myfiles_browser tool that handles RAG, and it isn’t returning the metadata that the quote_lines function uses when quoting context. So those inline citations flat out won’t work.
Just in case it’s past files you uploaded that you can’t see, ask the GPT helper to “Delete all files right away then verify they are deleted”
If it does that and then you still see that you have invisible files, well, they’re probably a critical part of the system architecture and not something you’re meant to be able to delete.
let finalResponse = await openai.beta.threads.messages.list(run.thread_id);
let responseText = finalResponse.data[0].content[0].text.value;
let index = responseText.indexOf('【');
if (index !== -1) {
responseText = responseText.substring(0, index) + '.';
}
I’m having the same issue. Response comes with 【13†source】but annotations array is always empty.
For our use case, it’s really important to know which files have been used to generate the answer.
The brackets that are used in the response are called corner brackets. I included the lines below in my assistant instructions and it seems to be working and not including the source:
You should never cite the source of your response. You should never include corner brackets in your response.