There is the CLIP, an OpenAI Open source (GitHub - last release Feb 20, 2023) described as: “A Contrastive Language-Image Pre-Training neural network trained on a variety of image/text pairs” - to be installed locally. It is an idea to take a look at it.
2 Likes