Hi, I am trying to parse several pdf files attempting to maintain the overall structure of the file for context. Such as “Headers”, “Subheaders”, “normal text” etc.
I have thoroughly searched for python libraries / examples, but no library is ‘perfect’.
Perfect in a way that it preserves a hierarchy of elements on the text.
For instance, a book chapter named “how to find the answer for problem x” is probably more important than a line “how to find the answer for problem x”.
Thanks for sending the other topics, I had seen it but was looking for something “fresher”