Gpt-4o is not recognising the image properly

We are facing problem with chat completion api. The image recognised is wrong in the chat completion API. When I gave the below image, the response was not correct. The question was “find the slope of the line”. The response I received was that the two points are (-1, -4) (5, 2), but the correct points are (0, -3) (4, 0). The image is not working properly.

    "model": "gpt-4o",
                "role" => "user",
                "content" => [{
                              "type" => "text",
                              "text" => "find the slope of the line"
                              "type" => "image_url",
                              "image_url" => {
                                             "url" => "imageurl"

Welcome to the Forum!

Give it a try with the following prompt:

Please identify the coordinates where the blue line intersects the x and y axis.

Worked well for me - though I will note that the results are not necessarily consistent across API requests. It tends to vary a bit between +/- 3 and +/- 4. However, it is not as far off as in your case.

NB: Based on your post, it does not seem like you are expecting a value for the slope of the line. Hence the change in prompt.

I handed the image over to ChatGPT using gpt-4o, and more extensive techniques that could be used, even hinting that code interpreter was available.

Let’s identify two points from the graph:

  • The line crosses at (0,−4) and (6,2).

Vision failed, where the correct answer by my own looking would be (0, -3), (6, 1.8) approximately, a slope of 0.8. I even gave instruction to interpolate the values between grid, which was not done.

I improved the instruction, telling more about determining line crossing points:

From visual inspection, we can estimate the coordinates of these crossing points.

Let’s select a few points where the line crosses the grid lines:

(0, -4)
(1, -3)
(2, -2)
(3, -1)
(4, 0)
(5, 1)
(6, 2)

It even sent the image to python and had its own pixel grid overlaid. Nope.

This kind of vision analysis can be boosted in quality by multi-shot examples preceding the final problem. Lead-up messages of user input with image accompanied with correct simulated AI answers can both orient it to the task with more context and also improve the actual analysis.