File Tree in Vector Storage

sqlliot · September 25, 2024, 12:15am

Is there a best practice for having a file tree in the vector store?

Some other resources online suggested appending a comment with the full file path for each file at the time of upload (which sounds really easy to implement). Has anyone tried that, and did it seem to help at all with the Agent understanding directory file structure?

Has anyone tried other methods or hacks to make the Agent recognize the file tree?

jochenschultz · September 25, 2024, 1:10am

Could you explain what you are trying to achieve?

_j · September 25, 2024, 1:11am

The AI is given a myfiles_browser tool, to which it can write search query language. It is a good idea to inform the AI what the myfiles_browser tool will return when what is in the vector store is part of the assistants behavior and operation.

However, utilization is simply that the AI is presented with the ability to write a search query, and the chunks that are most similar to what is written are returned. It cannot explore.

The top-ranked chunks have the source file and an index number after they come back. However, the AI has no way to specify or explore a subdomain of documents.

Therefore:

“recognize the file tree” doesn’t have much meaning.

If the user is the one adding documents, which are returned by the same search query as any other vector stores in operation, then I can see it being a good idea to amend the AI’s knowledge. The file name alone, such as a post-prompt automatically added to a message “[I uploaded 2835.385.pdf]” might be a good idea to ‘activate’ more searching.

However, you can see that file name alone could be less than useful. Imagine:

if your user interface uploader also had a dialog that asked for “Contents: (what’s in the document you are adding)”. Forcing it to be spelled out by the user, and not relying on the user input to mention the file, might improve the searching, but could be cumbersome.

sqlliot · October 2, 2024, 9:50pm

I’m trying to let the AI differentiate between similarly named files with different file paths.

A real world example could be a git repo for a node.js website: index.html or index.js are “reserved” filenames, which represent the default entry point for a directory or the end of a route. There might be multiple index.js files in a single git repo, each file is in a different folder and each file is unique. They all have the same index.js filename but the folders / filetree gives each file a unique route/path/url to differentiate it from the other index.js files.
Similarly an API might have multiple route.js files with each file at the end of a unique path of folders.

If I wanted the AI to answer questions about the code in a specific index.js file, I would need to differentiate that specific file from all of the others with the same name.

jochenschultz · October 2, 2024, 10:00pm

There are multiple things around the file itself which are way more interesting. There could be meeting protocols, or chats with ChatGPT which lead to the production of the file.
You may also want to move the files which would mean you’d have to rebuild the vector store.
Moving files around without touching them shouldn’t lead to that.

So you should at least annotate the file path as meta data but if possible also add the prompts that lead to the creation of the code.
This could make it a lot easier for auto coders to work on the code.

I mean I often just copy and paste a chat with a customer into chatgpt and then start prompting…

sqlliot · October 3, 2024, 7:23am

I am adding all of the files programmatically during an initialization workflow. Later, on certain future event triggers, files will be programmatically updated / deleted / replaced / etc as needed.

For now what I’ve done is switched from the ‘files’ API to the ‘uploads’ API. That way I can change the filename during the upload. I am setting the filename on OpenAI to be/contain the entire file path.

That should at least create a semblance of the file tree or file structure, hopefully it will be effective for my use cases.

Topic		Replies	Views
How to make AI aware of filenames and file count in a vector store? API vector-store , file-search	1	88	June 6, 2025
Search only a specific file within an attached vector store API gpt-4	4	1858	June 7, 2024
Couple of questions regarding file uploads, file search, and vector stores API assistants-api , vector-store	2	625	December 11, 2024
Chat with one file in a multi file vector store or combine vector stores API	4	104	May 17, 2025
Is Assistant APIs RAG using filenames as a semantic value? API	3	77	November 26, 2024

File Tree in Vector Storage

Related topics