I’m attaching scanned images of documents and would like to know more about how images are handled behind the scenes.
For us, a typical image is about 2550x3300, but my understanding is that the images are resized to something like 1200x750, which means our document page that was 3300 tall is now 750.
Some of these are contracts 100 years old and both the document and scanned image can be poor, therefore, good resolution is critical. when the height is reduced to 750, we start getting a lot of mistakes.
First I would like to know what actually happens to the image, then I’m considering rotating the document 90 degrees to match the dimensions better.
Thanks in advance for any information you can offer.
Thanks. How about if I create the tiles myself so each tile is 768x768 and instruct gpt how to configure them? this little bit of increased resolution makes a big difference on the accuracy. Since we are processing birth certificates for a county, getting the child’s name correct is essential.
FYI, I was able to split an image to a document into top and bottom halves so I think the dimensions of these fit the horizontal dimensions that GPT resizes images to much better and optimized everything for better resolution GPT combined the two images together with no problem and we ended up getting much better results. this leads me to believe that if we needed to we could actually tile the images into n tiles ourselves so GPT would be working with a very high resolution image.