Providing Chat GPT the Ability to Mark-Up Images


I was wondering if it would be possible for Chat GPT to have the capability to mark up images. Although I appreciate GPT-4 Vision, there have been instances where I provided Chat with a diagram or a drawing of a complex chemical structure or a UML diagram and asked a question about it. Chat seemed to spend a lot of time trying to pinpoint the part it was referring to. Instead, wouldn’t it be easier to give Chat the ability to mark up portions of the image and then explain to the user - like the green part does this, the yellow part does this, the purple part does this, etc. Essentially, it would be like giving Chat a pen and allowing it to draw on top of/mark up an image like a human would. If this feature could be added, that would be great. Thank you!

It would be “easy”, but not easy for gpt-4-vision. Grounding, bounding boxes, entity identification, etc is not part of the AI.

Azure, for example, can layer different vision models to perform such a task.