Table extraction for LangChain and vectorstore

Hello, I want to read information from my documents and share it in chat, but my problem is that I have many tables in PDF files. How can I deal with them if I think Chat is bad at reading them? I’m thinking about separating them, maybe you have an idea or have you had the same problem? Downloading them from a PDF file is difficult and they do not have a single structure, each one is different.


Tables looks like this (its only half of this one, second part is on next page)

If you’re a programmer, you might want to have a look at pypdf or PyMuPDF.
Here’s a benchmark: