How to make AI aware of filenames and file count in a vector store?

SoftTimur · June 5, 2025, 11:51pm

Hello,

I have the following code, where I create a vector store that contains one file. In my tests, if userMessage is What's in "a_special_file.pdf"?, the AI is able to read the content of the file. However, if userMessage is What's the name of the files in your knowledge_base? or How many files do you have in your knowledge_base?, the AI is unable to provide a precise answer.

Is this behavior expected? What is the conventional way to make the AI aware of information such as filenames, the number of files in a vector store, and similar metadata?

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function createFile(filePath) {
  let result;
  const fileContent = fs.createReadStream(filePath);
  result = await openai.files.create({
  file: fileContent,
  purpose: "assistants",
  });
  return result.id;
}

// Example usage
const fileId = await createFile(
  "./documents/a_special_file.pdf"
);
console.log(fileId);

// Create a vector store
const vectorStore = await openai.vectorStores.create({
  name: "knowledge_base",
});
console.log(vectorStore.id);

// Add the file to the vector store
await openai.vectorStores.files.create(vectorStore.id, { file_id: fileId });

async function waitForFileToBeProcessed(vectorStoreId, fileId, timeoutMs = 60000) {
  const start = Date.now();
  while (Date.now() - start < timeoutMs) {
    const fileList = await openai.vectorStores.files.list(vectorStoreId);
    const fileEntry = fileList.data.find(f => f.id === fileId);
    
    if (fileEntry && fileEntry.status === 'completed') {
      return true;
    }
    if (fileEntry && fileEntry.status === 'failed') {
      throw new Error(`File processing failed: ${fileEntry.last_error?.message || 'unknown error'}`);
    }
    await new Promise(r => setTimeout(r, 2000)); // wait 2 seconds before next check
  }
  throw new Error('Timed out waiting for file to be processed.');
}

await waitForFileToBeProcessed(vectorStore.id, fileId);

// Check status
const result = await openai.vectorStores.files.list(vectorStore.id);
console.log(result);

const response = await openai.responses.create({
  model: "gpt-4.1-2025-04-14",
  input: [
    {
      role: "user",
      content: userMessage
    }
  ],
  tools: [
    {
      type: "file_search",
      vector_store_ids: [vectorStore.id]
    }
  ],   
});

_j · June 6, 2025, 12:45am

A vector store is accessed by a hosted internal semantic search tool.

The AI can only write a query that returns chunks from the entire vector store, and both vector stores of an assistant attachment and of a thread.

Therefore, there is no asking for file by filename, there is asking for knowledge.

OpenAI also provides nothing about what is in the vector store to the AI. The only thing you get is a constant system message added that says “the user has uploaded files”, even when that is untrue.

so:

you’ll need to tell the AI when the file search tool (aka myfiles_browser in some models) shall be used, when it is useful. Otherwise use is completely random and cost-doubling.
the searches the AI writes will have to match the topical information in files, in chunks.
The return of indexed items should give original filenames (like ChatGPT), but I haven’t gone back recently to observe context being filled with long uncompressible file id numbers.

Topic		Replies	Views
Chat with one file in a multi file vector store or combine vector stores API	4	302	May 17, 2025
File Tree in Vector Storage API vector-store	5	416	October 3, 2024
Couple of questions regarding file uploads, file search, and vector stores API assistants-api , vector-store	2	970	December 11, 2024
File Search: Fail to choose the correct file API gpt-4 , chatgpt , assistants-api , file-search	0	247	September 16, 2024
Anyone know (in the API) how to attach a vector store to a thread (and actually get it to work) API	4	1054	September 1, 2024

How to make AI aware of filenames and file count in a vector store?

Related topics