I am building an agent using OpenAI’s Agent SDK. Meanwhile, I created vector stores using the open-source vector database Chroma DB. I want to use the agent created by the OpenAI Agent SDK to query the knowledge base stored in Chroma DB. Right now, they each work separately. How can I make them work together? Below are two blocks of code; they work separately.
1.I am building an agent using OpenAI’s Agent SDK, shown below, called knowledge_agen. I uploaded PDF documents to OpenAI storage and generated vector stores automatically on OpenAI’s platform, so the knowledge_agent can query the vector stores using Agent SDK’S FileSearchTool. In FileSearchTool, I defined the vector store ID to query, notice this vector store ID is based on OpenAI’s platform. The code is like the following, and this works.
knowledge_agent = Agent(
name="Knowledge_Advisor",
instructions=INSTRUCTIONS,
tools=[FileSearchTool(
max_num_results=8,
vector_store_ids=["stekekekek"]],
include_search_results=True,
)],
model_settings=ModelSettings(tool_choice="required"),
)
- However, I am looking for a more scalable approach, particularly I am looking for an open-source platform that can support CRUD vector stores. Chroma DB is such a choice that fits my use case. I am building vector stores using Chroma DB by defining the PDF file path and embedding model, then creating a collection of the knowledge base (named collection). Below is the code:
# use ChromaDB's default model for embedding: All-MiniLM-L6-v2 (free)
default_ef = embedding_functions.DefaultEmbeddingFunction()
# vector store path
vector_store = "vector_store/guide"
if not vector_store:
os.mkdir(vector_store)
# create a vector store called collection, this is the knowledge base I want to query by agent
client = chromadb.PersistentClient(path=vector_store)
collection = client.get_or_create_collection(name="guideline",
embedding_function=default_ef)
#query the knowledge from the vector store directly
results = collection.query(
query_texts=["which vehicle is most expensive"],
n_results=2
)
As shown, I can query the knowledge base (named collection) directly from Chroma DB. But I want to use OpenAI’s Agent to query it.
3.My question is: I want the agent (using OpenAI Agent SDK) to search the knowledge vectors stored in Chroma DB (collection). But the FileSearchTool() in the agent in (1) above can only allow to input vector_store_ids from OpenAI storage. I don’t know how to define the Tool from OpenAI agent to search external vector stores (in this case, collection stored in Chroma DB). How can I integrate the OpenAI Agent SDK shown above with Chroma DB vector stores? Can you show me some pipeline or functions of how to connect these two together? Thank you!