When does Vision process images exactly?

According to https://platform.openai.com/docs/guides/vision/managing-images:

If you want to pass the same image to the model multiple times, you will have to pass the image each time you make a request to the API.

Does this mean that if I sent an image as part of a conversation transcript and I ask the bot “What time is it?” it will still read/parse/see the image (and charge me tokens) even if it’s not attached to the last message and I’m not asking about it?

When does Vision process images exactly?