Using large PDFs to make a ChatBot

You should properly parse tabular data. There is definitely packages for that. After that you can go multiple ways:

  1. You can just put in the tabular data in your vector db (make sure the table data is contained within one chunk with a header).
  2. You can take each value of the table and generate an explanation for it. Then the llm is going to know exactly what each value means.
  3. Use a knowledge graph, something like neo4j. This is going to make sure that each value in your table (and other data in general) has a relation to all other data.

Depending on your needs one of these is going to work better than other.