Computer vision using Language models

I tried some experiments and was impressed that Language models are capable of vision even they may never saw any image before