Finding accurate location of objects from image API?

I am passing a JSON schema for output to the image API, asking for a list of objects with their description and bounding boxes. I get some bounding boxes that are somewhat related to the truth, but very loose. In Azure cognitive services I can get accurate bounding boxes of detected objects (but no custom prompt), so I am wondering if I could get that from here?

Example of what I get with bounding boxes from response JSON drawn onto image. You can see the bounding box has single decimals only. I use 512x512 images, so shouldn’t be scaling involved.

1 Like

For comparison, this is the results from Azure cognitive services

Have you tried yolo?