Extracting information from a large set of doucments

I’m working on a project where I need to extract key information from a large set of documents to answer user queries. After several months of using RAG (Retrieval-Augmented Generation), I’ve realized that it may not be the best approach for this task. Therefore, I’m considering the following solution:

  • An AI agent will systematically extract information from each document in the background and store the extracted data in a database.

  • My chatbot will then connect to the database and use Text2SQL to generate responses to user queries.

If anyone has experience with similar requirements, I would appreciate your insights. Thanks in advance!

2 Likes

Hi @ramanan.iyer and welcome to the community!

This makes total sense. We’ve done something similar in the past, i.e. using GPT-4 to extract specific information and storing it either as a table or as a graph (Neo4j). Then on the query side, we would have a translation of natural language to some query language.

One tip I would suggest is to NOT make it super generic or freestyle, but to confine it to pre-defined types of responses and data you can get back.

So two methods worked well in the past:

  • Using function calling, so e.g. mapping queries such as “What is the invoice number on the last purchase from supplier X” to a function get_invoice_num(date, supplier)
  • Translating queries to a set of filters, e.g. “I am looking for a freelance full-stack developer with great cloud infra skills who is based in Germany” would map to a set of filters like {"location": "Germany", "role": "fullstack engineer", "skills": ["cloud", "infrastructure", "terraform", "GCP", ...]}. Then these filters can be applied to your table/graph or whatever structure you use to store the data
  • Implement a graceful fallback in case the query is not understood, or data is not available

Lot of effort is also spent in putting guardrails and constrains on the conversation/query - this takes quite a bit of tuning to get right.

Hope this is helpful!

6 Likes

Hi @platypus
Thank you for your detailed feedback; it’s quite helpful.

For an easy start and developing a POC I was thinking of developing the ChatBot part as below:

  1. Directly use https://www.text2sql.ai/ to generate a suitable SQL based on the User’s question (of course with some kind of fallback).
  2. Execute the SQL.
  3. Pass the original question and SQL results to an LLM for generating the response.

Since this project is being developed for internal use only, we can also train our internal users on how to ask the right kind of questions. :slight_smile:

Best regards,
Ramanan

i use ai to create music play list i have a very peculiar way to play music as i have never beatmach music i develop a key secuens to play here is one of many key 01 key 06 key 03 key 08 key 05 key 10 key 07 key 12 key 09 key 06 key 11 key 08 key 05 key 02 key 07 key 04 and in music thereis mayor sound minor soud i conbinthem i give ai a large side of music and organise them in the way i like the i use a programe that use the argorythem of the list and rearange tme in a much better way ai is here there is so much to lorng human and ai long live AI