Estimation of measurements in images or PDFs

Hello, I want to send an A0 size PDF to Chat GPT. Let’s assume this PDF contains a 2D square drawing at 1/50 scale. I want Chat GPT to tell me the dimensions of this square, referencing the A0 dimension. Then multiply the result by the scale to find the actual dimensions of the square. Is there an extension that can do this? If not, how can I solve this?

@barslan Welcome to the forum!

I wouldn’t recommend using any AI in this process. Any AI involved in the processing chain could generate hallucinations, leading to incorrect calculations.

The traditional approach involves reading the PDF using advanced software capable of extracting metadata. It’s crucial that this metadata is accurate and provides the necessary details, including the unit of measurement and its relation to the specific image. Afterward, programming can be used to perform the required calculations and present the information.

It’s important to note that when using PDF text extraction software, the metadata may not be extracted in a useful way. This can lead to a disconnect between the image as a reference and the later extracted text containing measurement information.


Looked into this a bit more by creating a PDF with an outline of a square.


I know it is not exactly what you noted but good enough to do some rough experiments.


asked a few questions but it never noted it could identify a square in the image.

However this revealed some useful information

Tell me as much as you can about the file including the meta data.

The metadata extracted from the “square.pdf” file provides the following information:

  • Author: Not specified
  • Creator: TeX (a typesetting system)
  • Producer: pdfTeX-1.40.25 (a version of TeX used to produce PDF documents)
  • Subject: Not specified
  • Title: Not specified
  • Number of Pages: 1

The file appears to have been created using TeX, which is often used for technical and scientific documentation due to its capabilities in producing precise and high-quality documents. The fact that it’s a single-page document might align with your query about a square line drawing, as such drawings are typically not extensive. However, without the ability to view the contents directly, this is as much as can be determined from the metadata.

Clicking on image

import PyPDF2

# Define the file path to the PDF file
file_path = '/mnt/data/square.pdf'

# Opening the file in binary read mode
with open(file_path, 'rb') as file:
    # Initialize the PDF file reader
    reader = PyPDF2.PdfFileReader(file)

    # Gathering metadata
    metadata = reader.getDocumentInfo()
    num_pages = reader.getNumPages()

    # Compiling metadata information
    meta_info = {
        "Creator": metadata.creator,
        "Producer": metadata.producer,
        "Subject": metadata.subject,
        "Title": metadata.title,
        "Number of Pages": num_pages



{'Author': None,
 'Creator': 'TeX',
 'Producer': 'pdfTeX-1.40.25',
 'Subject': None,
 'Title': None,
 'Number of Pages': 1}

Checking PyPDF2

PyPDF2 is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. PyPDF2 can retrieve text and metadata from PDFs as well.

This aligns with what is noted on the OpenAI File uploads FAQ

1 Like

Obtaining dimensions and bounding boxes from AI vision is a skill called grounding.

You can, for example, see how Azure can augment gpt-4-vision with their own vision products.

Other AI vision products like MiniGPT-v2 - a Hugging Face Space by Vision-CAIR can demonstrate grounding and identification.

Such metrics are needed as a basis for measurement.

gpt-4-vision alone might give you a description and be coaxed into extrapolation, but it is unlikely to be reliable.

1 Like

I understand, thank you very much. What you said has given me an idea. I will research this too.

1 Like