To Extract data from rent roll pdf

Personally I would convert the PDF’s to Markdown (text) and the feed this to the LLM for extraction using structured outputs to get JSON. Then feed the JSON to a traditional program for analysis, or to another LLM call if needed.

For the first step, converting the PDF to Markdown, I just rent a cheap cloud based A100 to do the processing needed, or you can try your luck with an API that does this. The rest is with the LLM API, and whatever program you want in-between or after for residual processing.

2 Likes