The problem is that ChatGPT fails to scan the files properly.
I have attached 2 identical pages from a pdf with old pages. The first page is the original one from the PDF (you can see the writing a bit poorly), the other is the same page after it was saved in TIFF, from the pdf. You will see the difference immediately.
Let’s say that I have a PDF file with such images that are not clear, or because the pages of the book sound old, damaged or printing problems. In order to visualize them much better, because many of the books have writing barely visible, i.e. erased, then you have to save each page from the pdf in the .tiff extension
This will make the writing darker and much clearer. Only then go through OCR the image. Adobe Acrobat Pro - Save as other - Image - TIFF All pdf pages will be automatically saved in TIFF, and each page will be seen 68% clearer. Any AI tool must do the same before scanning with OCR, because it will be much easier to read the writing.
As a basic idea, if you want to do a quality job, for example if you are crawling with OCR among thousands of PDF files, so in order to correctly process a text from a pdf (with OCR), you have to scan it several times times, like this: first with thin text, then with thick text, then without shadows, then with more brightness.
Original image, bad quality
Same image, good quality:
Test with ChatGPT, see if it can correctly identify the text from the original page (with bad quality), then how to identify the text from the same pdf page, but saved in .tiff format