GPT4 V Object detection bounding box value incorrect

Thiwanka · June 29, 2024, 7:07pm

I want to identify the object bounding box in the given image, but the bounding box values identified by ChatGPT are always incorrect.

used prompt : Identify all objects in the provided image. Ensure that the bounding boxes tightly fit around objects with the exact coordinates (x_min, y_min, x_max, y_max) provided for the objects. Only include the object that can be clearly identified; if the object cannot be precisely detected, do not suggest it. The output should be a JSON array containing the object type, name, and coordinates for each identified photo frame. Ensure the bounding boxes accurately encapsulate the objects and the coordinates are exact.

_j · June 29, 2024, 7:39pm

GPT-4-Vision is not a object grounding model. It is not made to provide entity extraction with bounding boxes.

Images also are resized in most cases, especially in ChatGPT where we can only guess at OpenAI using similar techniques as to what is on the API for developers.

Here you can see how Azure AI vision overlays simple grounding over gpt-4 verbosity to provide enhancements.

Your experience in ChatGPT is therefore as expected.

Topic		Replies	Views
The scope of V in GPTV - what is missing? Prompting chatgpt	3	759	January 9, 2024
Getting GPT Vision To Return Coordinates Prompting gpt-4 , gpt-4-vision	8	5577	February 4, 2025
The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis API gpt-4 , chatgpt , gpt-4-vision , gpt4-vision , api-vision	5	870	July 27, 2024
Gpt-4o is not recognising the image properly Bugs	2	333	June 25, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3551	December 6, 2023

GPT4 V Object detection bounding box value incorrect

Related topics