Hello,
I am using ChatGPT-4o, trying to read construction plans and have the chat extract the information for me
The plan is in A0+ paper, as an image it is 9708X5085 pixels
I tried uploading a PDF but it says that it can’t read texts off the PDF
I tried uploading a jpg but it gets resized and most of the information gets lost
a zip file of the same image was recommended but it didn’t provide good results
I tried uploading a tiff but after a long loading time got this response
“It looks like my system is having repeated issues processing the OCR step due to technical constraints with large image sizes.”
I also tried DWG,PLT,DXF
I feel like the jpg or tiff would be the best solution but might be too heavy for analyze? The chat’s recommendations start going in circles and I am out of options
I could cut the image, or create some program that does, but the question is do I need the cuts to be “pretty” and not cut the plans midline? or can I just cut it into equal “boxes” dividing it into segments
I would recommend ensuring that the segments can stand on their own and don’t require relative spatial alignment.
While the model ‘could’ potentially patch things together, you have to remember that these things are fundamentally generally mostly just one-dimensional creatures.
If you do slice stuff down the middle, you may need to come up with a special chain of thought (CoT) prompt that allows the model to navigate your snippets to answer your questions.
Ok, so now I’m in cell C4 in image 1, which appears to be cut off. I would expect it to continue with D4 in image 2. It seems like together, these two cells form a bathroom
something like that. I just pulled that out of my behind, but you get the gist.
You likely need a prompt that stimulates this type of behavior, but it might be prone to hallucinations.
Really depends on the level of quality you’re after. Cleaner data yields cleaner results.
I tried breaking down with some logic and it seems to get better results, however I try to make it work as automatic as possible, would it be reasonable to try and use an AI tool to break the images?
ok so this partially works, but it does require a lot of step by step explanation to get the results I am aiming for, and would be hard to trust it even semi-blindly
I was wondering if I can assemble a list of examples of plans, to make some sort of a machine learning mechanism