Differences in token count by usage

For general use, I know that the number of tokens is equal to one English word multiplied by 1.33.

And for GPT-4-Vision, we know that the number of tokens depends on the size of the image when we put an image. (Total tokens = 85 + 170 * n)

If I create an image via text prompt like Dalle3, is it possible to calculate the number of tokens in the output image?

I know that Dalle3 only offers 3 sizes: 1024x1024, 1024x1792, and 1792x1024, but I don’t know how to translate this into tokens.

So I tried to do a comparison based on ‘Price per 1K tokens in GPT-4 of the Open AI Pricing Table’, but it became difficult to understand when I saw the different prices for Dalle2, Dalle3, and Dalle3 HD for the same size…

Also, how is it correct to calculate speech-related tokens (Input for Whisper, Output for TTS) by estimating the average number of words in a speech?