Watching the [ChatGPT can now see, hear, and speak](video about image chatting) got me thinking…
The thumbnail shows the image zoomed in with a part circled. I initially thought this was going to be from ChatGPT.
While it was still very impressive, it got me thinking—how awesome would it be if you could send ChatGPT a picture of something and it could draw on the image (circles, arrows, etc) to point things out to you…
Especially if it was able to connect in to DALL-E to produce illustrated guides.
Hell, connect it to the Internet too.
In the future I imagine a model will,
- Accept the picture of the bike
- Identify the bike brand and model
- Locate the manual for the bike
- Provide detailed and illustrated step-by-step instructions for lowering the seat including a picture and description of the required tool
In the far future maybe it’ll create a quick tutorial video where an avatar demonstrates lowering the seat on an exact copy of the bike…
2 Likes
It is doable. The AI already know the position of the object in image. If you check other object/face recognition projects in the web, they usually show a bounding box around the detected parts, even in real time. But I hope if they will implement it in ChatGPT, they’ll use a scribed circle as if it is drawn by pen/marker. It would be visually pleasing that way.
1 Like
When a plugin generates an image how can the model see it?
1 Like
Oh, it’s absolutely doable but it’s another layer on top of what they’re already working on.
I don’t expect we’ll see it this year, or even maybe next.
1 Like
I agree, OpenAI’s team creates incredible programs but it also takes time. After ChatGPT4’s release they had more time to work on DALL-E and recently announced DALL-E-3. They are switching there main focus between different programs and I personally am fine with it.
1 Like
I’m pretty sure they are mostly different teams working on their own products. The underlying technologies are very different.
1 Like
I’ve been following Prafulla Dhariwal on twitter since the “Jukebox” days (2020). He’s definitely involved w/ multiple projects.
https://twitter.com/prafdhar
Bio, " Co-creator of GPT-3, DALL-E 2, Jukebox, Glow, PPO. Researcher"
1 Like
Do note I wrote mostly not entirely. There’s absolutely overlap, my point was more to illustrate OpenAI can and is working on multiple products simultaneously.
1 Like
Yea, sorry I almost edited to add that they can and do definitely multitask with lots of brilliant people. Was just an opportunity to mention Prafulla who I totally admire.
1 Like
Gadcuit
24
I think it should be open to users who don’t use ChatGPT4 as well. The reason is that it should already be available for all NLP.
2 Likes
I hope it’s coming sooner. I think the building blocks are ready. Dalle3 works like that, back and forth… I’m sure GPT can see what Dalle3 creates.
1 Like
j’ai utilisé la synthese vocal des smartphones pour les invites et les réponses avec l’API.une sorte de traduction rapide en texte utilisateur. mais la je pense que c’est du direct.
ce serait parfait
1 Like
Reese
27

I tried just uploading an image a day to code interpreter (“look at this!”) last week because I wasn’t sure if and how I’d know I have GPT-4V access. When it turned out it was blind GPT-4, I pasted the model card and other info from OpenAI and just asked the AI if its GPT-4V version would “basically be like CLIP with an absolutely gigantic text transformer attached to it”.
Seems like I wasn’t the only one with that idea! 
1 Like
_j
28
Don’t ask the AI. It won’t know.
1 Like
Reese
29
Gotta insist that the AI knows, the the AI can, and that the AI is able to. 
Granted, Bing is, uh, special - it twisted that approach by suggesting prompts for you to use, and if you tap them, it will say "I’m sorry but I prefer not to continue this conversation
".
1 Like
Got access…
Fed it a DALLE3 image haha
Original DALLE3 prompt…
Thought-provoking digital art capturing a first-person POV from the bridge of a state-of-the-art spaceship, designed by an AI birthed by humans. The intricate control panels, holographic displays, and other advanced tech elements illuminate the bridge in a soft glow. The central console displays the words ‘Quantum Warp Drive Activated’. As the activation sequence commences, the star-studded void of space outside begins to stretch, blur, and tunnel, signaling the ship’s entry into hyperspace. The surreal visuals of stars becoming streaks of light, and the warping of reality around the ship, evoke a sense of wonder and the monumental leap of technology and exploration.
3 Likes
Kinetic
32
Does anyone know when API documentation will roll out for chatgpt4-vision model? Would love to develop a plugin idea I have in mind for it.
3 Likes
qrdl
33
Soooooo cool.
I fed GPT4 a bunch of uml diagrams and it generated code.
2 Likes
Hope to get one gpt4-v model api working too, I really need this to my project.