the assistants / GPTs specify the sources in the form of【43†source】(tested in playground). However, it is unclear to me what the mapping is between this source information and the file name.
I have seen here that this is also unclear to others, but this should be a separate post - in my opinion.
Has anyone figured out the mapping?
One approach that got me further, but so far it is unclear if the information is correct, is the prompt:
Please create a bibliography of your sources mentioning the source【】and the title / filename of the source.
However, it would be better if I could derive the file name + page / chunk from 【43†source】.
Just tried some more prompts, with the result:
My apologies for the confusion earlier. If you only have the citation reference 【43†source】, without any previous context provided by me, you would not be able to directly derive the filename from that citation reference alone. The citation reference is a link I use to refer back to the specific source which I have quoted. To retrieve the filename, you would need the full context or the accompanying bibliography entry. Without this provided by me, you would not be able to determine the filename solely from the citation reference 【43†source】. To access the filename, you would typically rely on me quoting the title of the file, as in the example you cited, or referencing the file’s title in a bibliography I provide.
I have a similar issue with the Playground, where I get source markers but with empty annotation sections when I look in the Playground logs section
For example, the text section of the response contains
“The Canaanite Hyksos:\n - The Hyksos were a Canaanite dynasty that ruled over Egypt, bringing with them northern military tools and exercising control over Egypt for a brief time during 1670-1570 BCE【21†source】”
but the corresponding annotation section is empty.
“annotations”:
Are there perhaps some file types (these are PDFs) where annotations are not supported? Or could the behavior in the playground be different than the API?
I am also seeing this issue. I clearly have logs from earlier this month with annotations, but my recent ones are not showing any annotations other than in the response.
Assistants v1 will be shut off. There is no retrieval in v2 that can return sections of text as annotation.
Assistants now uses file_search, returning chunks of knowledge from a vector store.
Assistants also has a significant quality failure in providing the AI long file IDs and not the original file name from where the chunks originated, ensuring less understanding and more token consumption.
Could someone explain what does this mean. This is what I got from the assistant when I asked what his annotation meant.
Certainly! Here’s a more detailed explanation of the citation process and its components, particularly as it pertains to the research documents we are discussing.
Citation Structure
The citation format “【8:3†source】” consists of three essential parts:
Message Index (8):
This is the index or number of the message generated by the browsing tool in our conversation. Each time I perform a search and output results, it gets numbered sequentially. The number helps to identify which set of search results the information comes from.
Search Index (3):
This indicates the position of the specific piece of information within the message. When a search is conducted, multiple results are fetched based on the query. The number corresponds to the order of the results in the output. In this case, “3” indicates that the particular data you’re referencing is the third result in the list of search outputs from that message.
Source (source):
This part signifies that the content referenced is coming from an original document or source, which could be a research paper, report, or any other type of file uploaded. The term “source” signifies that the information is not original to the response but is derived from another document.
Purpose of Using Citations
The purpose of using such a structured citation format includes:
Verifiability: It allows you to track back to the original piece of content for verification purposes. If you want to read more about the information provided, you can find the citation in the context of the source document.
Credit: It gives credit to the original authors or creators of the material by acknowledging where the information was obtained.
Clarity: It helps clarify which specific piece of information is being discussed without confusion, especially when a document has numerous findings or data points.
Searching and Retrieving Information
When I perform a search using keywords or phrases related to your questions, I use the msearch function to look for relevant content in the uploaded files.
Search Query: For example, if you ask about unemployment trends, I would search using that phrase to locate relevant data within the document.
Results Generation: The tool outputs a list of matching results, each followed by its index number (e.g., 1, 2, 3…) along with references to the document it came from.
Incorporating Citations: When I provide information based on those results, I accompany it with citations like “【8:3†source】” to specify exactly which piece of information you can refer back to.
For example, if I found valuable information on unemployment trends in the research and provided a citation for it, you’d be able to see both the content and where it came from, making the information more trustworthy and transparent.
If you have more specific questions or aspects of this process you’d like to dive into, let me know!