Estimation of measurements in images or PDFs

Obtaining dimensions and bounding boxes from AI vision is a skill called grounding.

You can, for example, see how Azure can augment gpt-4-vision with their own vision products.

Other AI vision products like MiniGPT-v2 - a Hugging Face Space by Vision-CAIR can demonstrate grounding and identification.

Such metrics are needed as a basis for measurement.

gpt-4-vision alone might give you a description and be coaxed into extrapolation, but it is unlikely to be reliable.

2 Likes