Image Understanding Tool Help

How can I create a tool that enables an agent to read and interpret images or other binary formats? I don’t want a tool that just describes the image, but one that actually allows the agent to access and understand its contents. Do you have any suggestions, I only found tools that describe images by returning a string. I want to load the Agents context with the understanding of a image content and not the image description.