I just did a deep dive into what you can expect for token usage (and rate usage) for a variety of resolutions, detail settings, and models.
If the image resolution at detail:high takes the same number of tiles, the cost will be the same. This means anything from 513x513 to 1024x1024, or anything in between, results in 4 overlay tiles (on top of a base “low” image.)
There are also peculiarities in the internal downsizing even on detail:high. Your image will be downsized so the shortest dimension is at most 768 pixels. Send 3000x3000, the model sees 768x768 - 4 tiles of 512x512. Send 2000x500, the model sees 2000x500, also 4 tiles of 512x512.