How can you use visual input

i was surprised when i saw a third-party apps llike genie ai chatpot that uses gpt 4 has an option to add image recognition and i tested it and it was amazing how do they have this feature and it is still not available in chat gpt plus i hope somone can help me cuz i need it in my project.


That 100% was not OpenAI’s GPT-4 Image Interpreter at work.

At this moment, only ‘Be My Eyes’ (an app that aims to helps out blind or otherwise visually impaired users) has exclusive access to it, and even then you still have to get on a wait list (and be visually impaired of course, it’d be very cruel to hop into the list if you’re sighted, thereby possibly stealing the spot of someone who actually needs it for daily functioning)

What you have encountered was highly likely an open-source image interpreter. There are heaps of them out there (LLaVA, Mini-GPT4, BLIP2, etc etc) These are all being used at right now in a lot of hot and upcoming apps like MemeCam etc. Without knowing the exact model behind some of these apps, some people (especially those less familiar with AI) just assume it’s GPT-4’s native multi-modality at work. Especially Mini-GPT4 could possibly mislead users with its name; it sounds like it could be the name of a lightweight version of OpenAI’s GPT-4 model, but it has absolutely nothing to do with OpenAI.

Right now, it is just not possible for any developer to create and release a live product for it. I’ve heard that, other than ‘Be My Eyes’, a very select amount of people do have access to it, but surely no developer is currently allowed to actually release a live app that incorporates the GPT-4 image interpreter.

If the website that you have been using explicitly claims to use OpenAI GPT-4’s native multi-modality, then you have been lied to.

1 Like

thank you kevin for the clearance i doubt it it’s smth else other than GPT 4 too i just didn’t know what tool could possibly be integrated with gpt 4 to add the image recognition feature

do yk which one of the modles u list it is the better one or do u recommend one of them.