When I search all results that come back are on turning a description into an image but I want to do the opposite. I want to start with an image and have GPT3 describe to me what the image is of or even better have it build a description with added content of the surrounding text (I am processing webpages).
You might take a look at technology like BASIC-L and CoCa, there are lots of image classification models out there. A ready build one would be the Microsoft Image Processing API, there are others.