When uploading files to a vector store to be used for RAG by an Assistant I am wondering if the filename of each file will have an impact on the retrieval during RAG.
Currently my files have random filenames (actually the filenames are some sort of ID, but they do not hold semantical value). I am wondering if naming the files with a title that somehow summarizes their content might help for RAG.
Standard chunking methods typically disregard the filename when splitting data. However, you can create a custom chunking strategy that appends metadata—like the filename—to each chunk.
There is a metadata field but it seems to be a static map, equal for all files as it is only present in the “create vector store” API, but not in the “create file” API:
Could you provided a pointer into the documentation to the part you had in mind regarding defining a custom chunking strategy that will include the filename into a file’s metadata?
As for OpenAI’s vector store, I haven’t work with it. From what I’ve seen in the documentation provided, its capabilities seem fairly limited compared to LangChain.