Finding accurate location of objects from image API?

For comparison, this is the results from Azure cognitive services