Hi, does anyone know of any solution which can understand the layout of a PDF and then extract the content based on the layout


there are a couple of solutions on github and some plugins do this as well on chatgpt paid version.

If you’re looking for a solution that can understand the layout of a PDF and extract content based on that layout, you might want to consider using a tool like Tabula or Apache PDFBox. These tools have features that allow you to extract structured data from PDFs, taking into account the layout and formatting.

