I’m trying to get GPT4 Vision to read a pressure dial, which is a radial dial. It is doing very poorly, even when provided with several examples in the context window.
Its quite similar to the example in Microsofts research where they got it to read a speed dial by providing 2 examples, but they had eventual success, I have not, even with 6 examples. (cant include link per forum rules, sorry!) (pg 22, section 3.4)
So here are some of the example images I am giving it - I have given it up to 6 examples in the context, but even then it still gives totally wrong answer.
I have already cleaned up the images by removing all but the dial, and have added a yellow line for the needle.
ANY suggestions welcome, please!
Prompt:
You are a visual expert and are an excellent AI system who can read radial dials
To help you i am giving you some examples of what the dial I want you to read looks like under some settings
*For Additional context, the large ticks are *
*0 (labeled), *
*0.1, *
*0.2 (labeled), *
*0.3, *
*0.4 (labeled), *
*0.5, *
*0.6 (labeled), *
*0.7, *
*0.8 (labeled), *
*0.9, *
1 (labeled)
Small ticks are 0.02.
*I want you to read the pressure dial in image (t1). Use the LONG YELLOW NEEDLE as the pointer. *
*The LONG YELLOW NEEDLE Goes from the Centre of the dial to almost reach the outer edge of the dial. *
It is those long part that nearly touches the edge of the dial that points to the reading we want to log
CAN YOU FIND THE NEEDLE IN THE DIAL in image t1 AND JUST TELL ME WHERE YOU THINK THE NEEDLE POINTS TO PLEASE. Do your best. thank you.
Then I give up to 6 examples like this:
"This is image (s1). Here the long yellow needle is one short tick below the long tick that represents 0.2. Therefore it is 0.2 - 0.02 = .18 ’
I will get an answer like this:
In image (t1), the long yellow needle is positioned just slightly beyond the second small tick after the 0.2 large tick mark. Given that each small tick represents an increment of 0.02, and the needle is just beyond the second small tick, we can determine the value the needle is pointing to by adding two increments of 0.02 to the value of the previous large tick, which is 0.2. Therefore, the reading on the dial would be:
0.2 (value at large tick) + (2 x 0.02) (value of two small ticks) = 0.2 + 0.04 = 0.24 MPa.
So, the needle in image (t1) is pointing to approximately 0.24 MPa.
Which is wrong. correct answer is 0.34 MPA.