Hello OpenAI Community,
I was incredibly fascinated by the recent blog post, “Inside our in-house data agent”. The architecture of Kepler, especially its ability to autonomously navigate complex datasets, is a huge inspiration for developers like myself working on agentic systems.
Based on the article, I’ve been trying to piece together the underlying mechanics and have a couple of deeper, more technical questions that I’d love to get more clarity on. Any insights the community or the OpenAI team could provide would be immensely valuable.
Here are my questions:
1. On the RAG / Knowledge Search Mechanism:
The article mentions Kepler uses a “code-enriched data context” and performs a “knowledge search” to find relevant datasets. I’m inferring this is a sophisticated RAG system. I’m particularly curious about the interaction between the different knowledge sources.
•Retrieval Order & Prioritization: What is the typical sequence of retrieval? For instance, does Kepler perform a semantic search on a vector database of metadata/documents first, and then apply metadata filtering (e.g., based on time range, data quality scores)? Or is it a more integrated hybrid search?
•Knowledge Base Relation: How are the various components of the knowledge base—such as the structured metadata catalog, unstructured documents (mission docs), and the memory store (past interactions)—related and weighted during the retrieval process? Is there a hierarchy?
2. On the Advanced SQL Query Generation & Selection:
The process of generating and testing multiple SQL query candidates (“patterns”) to calculate metrics like “variance in trip duration” is a powerful concept for ensuring robustness.
•Rationale for Multiple Candidates: What is the primary motivation for generating multiple query patterns? Is it mainly to mitigate the risk of a single biased interpretation by the LLM, to explore different analytical paths simultaneously, or for another reason?
•Selection Criteria for the Optimal Query: How is the “best” query ultimately selected from the candidates? The article implies an execution-based approach. Is the selection based on a combination of factors like:
•Successful execution (no errors)?
•Performance metrics (e.g., query execution time from EXPLAIN ANALYZE)?
•A “consistency verification” step where the result sets from different queries are compared?
•Or perhaps a quality score that also considers query simplicity or similarity to previously successful queries from the memory store?
I believe understanding these architectural choices would be incredibly beneficial for the broader community building reliable and sophisticated data analysis agents.
Thank you for your time and for sharing such insightful work!
Best regards,
nozaki