Can I use OpenAI Agent SDK's FileSearchTool() to query the knowledge stored in ChromaDB's vector stores?

I am building an agent using OpenAI’s Agent SDK. Meanwhile, I created vector stores using the open-source vector database Chroma DB. I want to use the agent created by the OpenAI Agent SDK to query the knowledge base stored in Chroma DB. Right now, they each work separately. How can I make them work together? Below are two blocks of code; they work separately.

1.I am building an agent using OpenAI’s Agent SDK, shown below, called knowledge_agen. I uploaded PDF documents to OpenAI storage and generated vector stores automatically on OpenAI’s platform, so the knowledge_agent can query the vector stores using Agent SDK’S FileSearchTool. In FileSearchTool, I defined the vector store ID to query, notice this vector store ID is based on OpenAI’s platform. The code is like the following, and this works.

knowledge_agent = Agent(
    name="Knowledge_Advisor",
    instructions=INSTRUCTIONS,
    tools=[FileSearchTool(
            max_num_results=8,
            vector_store_ids=["stekekekek"]],
            include_search_results=True,
    )],
    model_settings=ModelSettings(tool_choice="required"),
)
  1. However, I am looking for a more scalable approach, particularly I am looking for an open-source platform that can support CRUD vector stores. Chroma DB is such a choice that fits my use case. I am building vector stores using Chroma DB by defining the PDF file path and embedding model, then creating a collection of the knowledge base (named collection). Below is the code:
 # use ChromaDB's default model for embedding: All-MiniLM-L6-v2 (free)
 default_ef = embedding_functions.DefaultEmbeddingFunction()
 # vector store path
 vector_store = "vector_store/guide"
 if not vector_store:
     os.mkdir(vector_store)
 
# create a vector store called collection, this is the knowledge base I want to query by agent
 client = chromadb.PersistentClient(path=vector_store)
 collection = client.get_or_create_collection(name="guideline", 
                                     embedding_function=default_ef)
 
#query the knowledge from the vector store directly
results = collection.query(
    query_texts=["which vehicle is most expensive"],
    n_results=2
)

As shown, I can query the knowledge base (named collection) directly from Chroma DB. But I want to use OpenAI’s Agent to query it.

3.My question is: I want the agent (using OpenAI Agent SDK) to search the knowledge vectors stored in Chroma DB (collection). But the FileSearchTool() in the agent in (1) above can only allow to input vector_store_ids from OpenAI storage. I don’t know how to define the Tool from OpenAI agent to search external vector stores (in this case, collection stored in Chroma DB). How can I integrate the OpenAI Agent SDK shown above with Chroma DB vector stores? Can you show me some pipeline or functions of how to connect these two together? Thank you!

One thing you can try is to write your own custom function, that returns the results to the agent.

Here is an example that you can use as a reference.

1 Like

Thank you @aprendendo.next but it is not very clear for me how. Can you specify it more further. Through which function from the agent perspective, to feed the retriever to the Agent?

A custom function, where you receive the request to search something and implement by yourself a vector search using your own implementation. Then you return the text with the results for the model.

It can be done by agent sdk or responses API.

https://platform.openai.com/docs/guides/function-calling?api-mode=responses&example=search-knowledge-base

2 Likes

@Mandy_He - You cannot use the FileSearchTool () but instead like @aprendendo.next suggested, write a function that does vector search for a given query and provide that as a tool call to the agent.