Watching the [ChatGPT can now see, hear, and speak](video about image chatting) got me thinkingâŠ
The thumbnail shows the image zoomed in with a part circled. I initially thought this was going to be from ChatGPT.
While it was still very impressive, it got me thinkingâhow awesome would it be if you could send ChatGPT a picture of something and it could draw on the image (circles, arrows, etc) to point things out to youâŠ
Especially if it was able to connect in to DALL-E to produce illustrated guides.
Hell, connect it to the Internet too.
In the future I imagine a model will,
- Accept the picture of the bike
- Identify the bike brand and model
- Locate the manual for the bike
- Provide detailed and illustrated step-by-step instructions for lowering the seat including a picture and description of the required tool
In the far future maybe itâll create a quick tutorial video where an avatar demonstrates lowering the seat on an exact copy of the bikeâŠ
2 Likes
It is doable. The AI already know the position of the object in image. If you check other object/face recognition projects in the web, they usually show a bounding box around the detected parts, even in real time. But I hope if they will implement it in ChatGPT, theyâll use a scribed circle as if it is drawn by pen/marker. It would be visually pleasing that way.
1 Like
When a plugin generates an image how can the model see it?
1 Like
Oh, itâs absolutely doable but itâs another layer on top of what theyâre already working on.
I donât expect weâll see it this year, or even maybe next.
1 Like
I agree, OpenAIâs team creates incredible programs but it also takes time. After ChatGPT4âs release they had more time to work on DALL-E and recently announced DALL-E-3. They are switching there main focus between different programs and I personally am fine with it.
1 Like
Iâm pretty sure they are mostly different teams working on their own products. The underlying technologies are very different.
1 Like
Iâve been following Prafulla Dhariwal on twitter since the âJukeboxâ days (2020). Heâs definitely involved w/ multiple projects.
https://twitter.com/prafdhar
Bio, " Co-creator of GPT-3, DALL-E 2, Jukebox, Glow, PPO. Researcher"
1 Like
Do note I wrote mostly not entirely. Thereâs absolutely overlap, my point was more to illustrate OpenAI can and is working on multiple products simultaneously.
1 Like
Yea, sorry I almost edited to add that they can and do definitely multitask with lots of brilliant people. Was just an opportunity to mention Prafulla who I totally admire.
1 Like
Gadcuit
24
I think it should be open to users who donât use ChatGPT4 as well. The reason is that it should already be available for all NLP.
2 Likes
I hope itâs coming sooner. I think the building blocks are ready. Dalle3 works like that, back and forth⊠Iâm sure GPT can see what Dalle3 creates.
1 Like
jâai utilisĂ© la synthese vocal des smartphones pour les invites et les rĂ©ponses avec lâAPI.une sorte de traduction rapide en texte utilisateur. mais la je pense que câest du direct.
ce serait parfait
1 Like
Reese
27

I tried just uploading an image a day to code interpreter (âlook at this!â) last week because I wasnât sure if and how Iâd know I have GPT-4V access. When it turned out it was blind GPT-4, I pasted the model card and other info from OpenAI and just asked the AI if its GPT-4V version would âbasically be like CLIP with an absolutely gigantic text transformer attached to itâ.
Seems like I wasnât the only one with that idea! 
1 Like
_j
28
Donât ask the AI. It wonât know.
1 Like
Reese
29
Gotta insist that the AI knows, the the AI can, and that the AI is able to. 
Granted, Bing is, uh, special - it twisted that approach by suggesting prompts for you to use, and if you tap them, it will say "Iâm sorry but I prefer not to continue this conversation
".
1 Like
Got accessâŠ
Fed it a DALLE3 image haha
Original DALLE3 promptâŠ
Thought-provoking digital art capturing a first-person POV from the bridge of a state-of-the-art spaceship, designed by an AI birthed by humans. The intricate control panels, holographic displays, and other advanced tech elements illuminate the bridge in a soft glow. The central console displays the words âQuantum Warp Drive Activatedâ. As the activation sequence commences, the star-studded void of space outside begins to stretch, blur, and tunnel, signaling the shipâs entry into hyperspace. The surreal visuals of stars becoming streaks of light, and the warping of reality around the ship, evoke a sense of wonder and the monumental leap of technology and exploration.
3 Likes
Kinetic
32
Does anyone know when API documentation will roll out for chatgpt4-vision model? Would love to develop a plugin idea I have in mind for it.
3 Likes
qrdl
33
Soooooo cool.
I fed GPT4 a bunch of uml diagrams and it generated code.
2 Likes
Hope to get one gpt4-v model api working too, I really need this to my project.