Impressive Image Understanding

Today I asked Codex to insert an image of a cat and then entered the prompt, “Make it so that when you click on the cat’s eyes make text appear underneath saying ‘You clicked the eye!’ for 3 seconds.” I did not expect it to work but to my surprise somehow it did. Even though I inserted a random picture of a cat I found on the internet, it was able to detect where both of the cat’s eyes were and make exclusively that area clickable. Does anybody know how it might have accomplished this?


  1. Can you try clicking anywhere else on the image?
  2. Can you try with another image?

Yep, I tried to click on many other locations on the image but to my surprise only the eyes activated the text. I haven’t tried it with a different picture yet but I’ll make sure to do that as soon as possible because maybe the AI just got incredibly lucky.

I did the same tests, probably the eyes where in the middle of the image, sometimes codex put a box 50x50 in the middle of the image to click, sometimes the function is activated in the whole image. With my tests codex was unable to identify things inside a picture.