How to perform Question Answering on Multiple JSON Files: Beyond SQL Query Generation with LLMs

Hello OpenAI Community,

I am currently exploring various approaches to implement a question-answering system on multiple JSON files. Here’s a quick summary of the methods I have already implemented:

  1. SQL Query Generation with LLMs:
  • I built a system where the user’s natural language query is converted into an SQL query using an LLM.
  • The generated SQL query is then executed on a relational database to fetch the relevant records.
  1. MongoDB Query Generation with LLMs:
  • Similarly, I developed a system where the user’s natural language query is translated into a MongoDB query using an LLM.
  • The generated MongoDB query is then executed on a MongoDB database to retrieve relevant data.
  1. LangChain JSON Toolkit:
  • Using LangChain’s JSON Toolkit agent, I implemented a solution to handle question answering on JSON files by generating relevant MongoDB queries via an LLM and executing them on the database.
  1. Combined JSON Data with Python Code Generation:
  • I combined all JSON file data and used an LLM to generate Python code dynamically to query the combined data programmatically.

While these methods have been effective, I am now seeking alternative approaches to enhance the robustness and versatility of my system. Specifically, I would like to explore methods that:

  • Do not rely solely on query generation.
  • Are suitable for handling unstructured or semi-structured JSON data.
  • Optimize for scalability and efficiency when dealing with large JSON datasets.

I would love to hear your suggestions, insights, and any other possible approaches or tools that could help me improve my system!

Looking forward to your ideas and recommendations.