GPT4 V Object detection bounding box value incorrect

I want to identify the object bounding box in the given image, but the bounding box values identified by ChatGPT are always incorrect.

used prompt : Identify all objects in the provided image. Ensure that the bounding boxes tightly fit around objects with the exact coordinates (x_min, y_min, x_max, y_max) provided for the objects. Only include the object that can be clearly identified; if the object cannot be precisely detected, do not suggest it. The output should be a JSON array containing the object type, name, and coordinates for each identified photo frame. Ensure the bounding boxes accurately encapsulate the objects and the coordinates are exact.

GPT-4-Vision is not a object grounding model. It is not made to provide entity extraction with bounding boxes.

Images also are resized in most cases, especially in ChatGPT where we can only guess at OpenAI using similar techniques as to what is on the API for developers.

Here you can see how Azure AI vision overlays simple grounding over gpt-4 verbosity to provide enhancements.

Your experience in ChatGPT is therefore as expected.

3 Likes