Finding accurate location of objects from image API?

PianoGamer · March 23, 2025, 12:20am

I am passing a JSON schema for output to the image API, asking for a list of objects with their description and bounding boxes. I get some bounding boxes that are somewhat related to the truth, but very loose. In Azure cognitive services I can get accurate bounding boxes of detected objects (but no custom prompt), so I am wondering if I could get that from here?

Example of what I get with bounding boxes from response JSON drawn onto image. You can see the bounding box has single decimals only. I use 512x512 images, so shouldn’t be scaling involved.

PianoGamer · March 23, 2025, 1:01am

For comparison, this is the results from Azure cognitive services

jochenschultz · March 23, 2025, 7:57am

Have you tried yolo?

Topic		Replies	Views
GPT4 V Object detection bounding box value incorrect Prompting gpt-4 , gpt-4-vision	1	2771	June 29, 2024
Getting GPT Vision To Return Coordinates Prompting gpt-4 , gpt-4-vision	11	10973	March 2, 2026
GPT API can not do image coordinates right API chatgpt , api , api-vision	4	408	March 6, 2026
Identifying pixel positions of elements in an image API	3	883	March 17, 2025
Limitation from resizeing Prompting gpt-4-vision	5	271	September 12, 2024

Finding accurate location of objects from image API?

Related topics