I’ve uploaded some PDF documents and converted them to Vector stores in the “Storage” section of the API dashboard. When I ask the assistant to answer a question from the PDF and reference the document, it will provide a text with a reference as “【4:1†source】”. Just wondering what this means and if anyone knows how I can get a proper reference from it?
The assistants API and its file_search no longer can provide quality annotations, where previously v1 retrieval would mark a section of text to be returned as an annotation.
The instructions to the AI model are still there in the file search tool, though, telling the AI to extensively write citation annotations in that style, though. This may lead to useless output you may need to strip. At best, you might get a chunk number: 800 tokens of text that could refer to a chunk return within steps.
You can see how the AI is instructed for using a vector store in this forum post (reporting the poor and undocumented quality). It has confused bad implementation in the tool language, as even in the newest version of tool instruction, there is no “document title” in the tool return.
You can see a walk-thru of how it used to work (at great expense):
Or how to toss them, instead of saying “file_search annotation: disabled” to the AI:
I guess I will be forced to use Langchain but I have limited time and it is difficult to know if I am getting the best answer. I saw the openai pdf assistant and it was very easy to use but obviously doesn’t have necessary features such as citations very sad.