Image files as context for Agents

I’m looking at the context management section of the docs and trying to figure out how I could have multiple image files as local context for an agent. For example, I could have an image of a car, of a train, plane and bus. And through the system prompt notify the agent that it can reference the appropriate image when generating it’s own image. The end user will not be aware of this.

In the Agent/LLM context section of the docs, one approach mentions exposing via function tools. Would I host the image in the cloud and then create a tool function that fetches that image?

Another approach is to use retrieval or web search. Would that be the FileSearchTool? Would I need to store the image files in a vector store?

Any help would be appreciated!