GPT-4 Turbo with Vision incorrectly analyzed the image

Rectangle 865

I am working on a feature to read blood pressure information from photos of blood pressure monitors. Some images are accurately interpreted by the API, but others are not. For example, in the attached image, the systolic blood pressure reading is 166, but the API consistently reads it as 156, sometimes as 165, and other times the error is much larger. Please help me.

1 Like

wow, that’s a brave application for a preview model!

do you get a more reliable result when you improve the angle of the image?

3 Likes

At this stage of the vision models development, text really needs to be within a degree or two of level normal text orientation to be accurately perceived. Might be worth exploring more mature OCR software for this purpose as there already exists digital display reading code for heavy industrial automation.

2 Likes

I find it very interesting that this isn’t working well, I’ve tested it a bunch of times on odometer readings (mileage from a car) and it’s been 100% spot on each time.

Is it possible that the image is too large and when the model cuts it into 512 x 512 squares it’s splitting the numbers? Have you tried reducing the resolution or zooming out?

2 Likes

I’ve changed the perspective of the image, but it’s still incorrect.


I just straightened it and it worked, tried it twice for good measure

Edit: Just tried it through the API as well, I got 156 with the original image and a successful 166 with the straightened one. Seems like it just need the text the be straight.

1 Like

The Dimension of this image is only 360x360.

It’s expected, you can read the current limitation of the vision model here: OpenAI Platform

I have tried many times. It doesn’t work as expected. It’s strange.

56757a6f93ce0cc4d2ea4b60851cabd796cd6c08
Try it again with this one

1 Like

That image is still rotated more than the acceptable limit, try straightening out the text by rotating the image.

2 Likes

I also tested the OCR abilities of vision-gpt on many cases and it often misreads things. I think currently googles ocr via googledocumentai is the most reliable(I tested your image and it got 100% correct, even with the slight tilt). Also it is much cheaper for mere ocr. You could run google ocr (or some other) to get the text and then give that to gpt for further processing/interpretation.

2 Likes

Excellent, it is now accurate. Thank you very much. I tried to align the image earlier, but it still wasn’t precise.

The vision model base tile is 512x512.

I think a human can do it at 60x60

image

Also, you can imagine putting some green dots at the corners of the display for auto perspective correction by other machine vision.

image

2 Likes

OCR is all about contrast (more contrast = less noise).


Contrast

image

image

Analyse the text and numbers on this picture. Report them back in plain text.

The image shows a digital blood pressure monitor with the following readings displayed:

  • Time: 6:41 PM
  • Systolic blood pressure: 165 mmHg
  • Diastolic blood pressure: 125 mmHg
  • Pulse: 80 beats per minute
  • Memory slot indicator: 53

On the monitor, there are also labels and symbols indicating the type of measurement each number represents, such as “SYS.” for Systolic, “DIA.” for Diastolic, and a heart symbol for Pulse. There are buttons with symbols for power and settings adjustments as well.


Threshold

You can also just higher the threshold levels.


Rotate

When rotating (and adding contrast) the 5 and 6 are still “swapped”.

1 Like

I just rotate the image:

Correct! For the test it could be used https://scantextai.com