Estimation of measurements in images or PDFs

barslan · December 28, 2023, 3:15pm

Hello, I want to send an A0 size PDF to Chat GPT. Let’s assume this PDF contains a 2D square drawing at 1/50 scale. I want Chat GPT to tell me the dimensions of this square, referencing the A0 dimension. Then multiply the result by the scale to find the actual dimensions of the square. Is there an extension that can do this? If not, how can I solve this?

EricGT · December 28, 2023, 3:51pm

@barslan Welcome to the forum!

I wouldn’t recommend using any AI in this process. Any AI involved in the processing chain could generate hallucinations, leading to incorrect calculations.

The traditional approach involves reading the PDF using advanced software capable of extracting metadata. It’s crucial that this metadata is accurate and provides the necessary details, including the unit of measurement and its relation to the specific image. Afterward, programming can be used to perform the required calculations and present the information.

It’s important to note that when using PDF text extraction software, the metadata may not be extracted in a useful way. This can lead to a disconnect between the image as a reference and the later extracted text containing measurement information.

EDIT

Looked into this a bit more by creating a PDF with an outline of a square.

I know it is not exactly what you noted but good enough to do some rough experiments.

Using

asked a few questions but it never noted it could identify a square in the image.

However this revealed some useful information

You
Tell me as much as you can about the file including the meta data.

ChatGPT
The metadata extracted from the “square.pdf” file provides the following information:

Author: Not specified
Creator: TeX (a typesetting system)
Producer: pdfTeX-1.40.25 (a version of TeX used to produce PDF documents)
Subject: Not specified
Title: Not specified
Number of Pages: 1

The file appears to have been created using TeX, which is often used for technical and scientific documentation due to its capabilities in producing precise and high-quality documents. The fact that it’s a single-page document might align with your query about a square line drawing, as such drawings are typically not extensive. However, without the ability to view the contents directly, this is as much as can be determined from the metadata.

Clicking on

import PyPDF2

# Define the file path to the PDF file
file_path = '/mnt/data/square.pdf'

# Opening the file in binary read mode
with open(file_path, 'rb') as file:
    # Initialize the PDF file reader
    reader = PyPDF2.PdfFileReader(file)

    # Gathering metadata
    metadata = reader.getDocumentInfo()
    num_pages = reader.getNumPages()

    # Compiling metadata information
    meta_info = {
        "Author": metadata.author,
        "Creator": metadata.creator,
        "Producer": metadata.producer,
        "Subject": metadata.subject,
        "Title": metadata.title,
        "Number of Pages": num_pages
    }
``

meta_info

Result

```json
{'Author': None,
 'Creator': 'TeX',
 'Producer': 'pdfTeX-1.40.25',
 'Subject': None,
 'Title': None,
 'Number of Pages': 1}

Checking PyPDF2

PyPDF2 is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. PyPDF2 can retrieve text and metadata from PDFs as well.

This aligns with what is noted on the OpenAI File uploads FAQ

_j · December 28, 2023, 3:53pm

Obtaining dimensions and bounding boxes from AI vision is a skill called grounding.

You can, for example, see how Azure can augment gpt-4-vision with their own vision products.

Other AI vision products like MiniGPT-v2 - a Hugging Face Space by Vision-CAIR can demonstrate grounding and identification.

Such metrics are needed as a basis for measurement.

gpt-4-vision alone might give you a description and be coaxed into extrapolation, but it is unlikely to be reliable.

barslan · January 2, 2024, 11:27am

I understand, thank you very much. What you said has given me an idea. I will research this too.

Topic		Replies	Views
Can API cut images (such as mathematical figures) from the PDFs? API gpt-4 , api , pdf	7	245	December 3, 2024
Best format to upload a construction plan for extraction of info Prompting gpt-4 , chatgpt , pdf	7	315	April 16, 2025
What is the best way to parse a PDF file with ChatGPT? API	9	48669	November 16, 2024
Getting GPT Vision To Return Coordinates Prompting gpt-4 , gpt-4-vision	9	7519	May 19, 2025
Limitation from resizeing Prompting gpt-4-vision	5	146	September 12, 2024

Estimation of measurements in images or PDFs

Related topics