I would imagine people will wish to be able to include images with prompts now that ChatGPT can do it, most of the other features can already be done, and to some extent, so can multi model, but it would be nice to have a fully integrated image and text API, we’ll have to wait and see.
1 Like
_j
9
March 2023, gdb. ( No, ChatCompletions does not support submitting a list. )
1 Like
Reading through the documentation I found that they gave beta access for Be My Eyes. I think it is amazing seeing all the ways this wonderful new technology can help people.
1 Like
So now it has eyes and ears. Much closer to having actual understanding of what an apple is. Looking forward to try that asap.
3 Likes
N2U
12
What do you mean? It’s definitely an ipod;
7 Likes
I’m eagerly awaiting the API. The fact that ChatGPT is becoming multimodal is truly amazing. However, without access to the APIs, my options are limited. Therefore, my current task is to persuade my boss and colleagues that the API isn’t available yet. Often, when they come across information from OpenAI, they assume the APIs are already prepared and stable. 
1 Like
N2U
14
You’ll have to brace yourself for a few more weeks 
Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after.
(Emphasis is mine)
2 Likes
Watching the [ChatGPT can now see, hear, and speak](video about image chatting) got me thinking…
The thumbnail shows the image zoomed in with a part circled. I initially thought this was going to be from ChatGPT.
While it was still very impressive, it got me thinking—how awesome would it be if you could send ChatGPT a picture of something and it could draw on the image (circles, arrows, etc) to point things out to you…
Especially if it was able to connect in to DALL-E to produce illustrated guides.
Hell, connect it to the Internet too.
In the future I imagine a model will,
- Accept the picture of the bike
- Identify the bike brand and model
- Locate the manual for the bike
- Provide detailed and illustrated step-by-step instructions for lowering the seat including a picture and description of the required tool
In the far future maybe it’ll create a quick tutorial video where an avatar demonstrates lowering the seat on an exact copy of the bike…
2 Likes
It is doable. The AI already know the position of the object in image. If you check other object/face recognition projects in the web, they usually show a bounding box around the detected parts, even in real time. But I hope if they will implement it in ChatGPT, they’ll use a scribed circle as if it is drawn by pen/marker. It would be visually pleasing that way.
1 Like
When a plugin generates an image how can the model see it?
1 Like
Oh, it’s absolutely doable but it’s another layer on top of what they’re already working on.
I don’t expect we’ll see it this year, or even maybe next.
1 Like
I agree, OpenAI’s team creates incredible programs but it also takes time. After ChatGPT4’s release they had more time to work on DALL-E and recently announced DALL-E-3. They are switching there main focus between different programs and I personally am fine with it.
1 Like
I’m pretty sure they are mostly different teams working on their own products. The underlying technologies are very different.
1 Like
I’ve been following Prafulla Dhariwal on twitter since the “Jukebox” days (2020). He’s definitely involved w/ multiple projects.
https://twitter.com/prafdhar
Bio, " Co-creator of GPT-3, DALL-E 2, Jukebox, Glow, PPO. Researcher"
1 Like
Do note I wrote mostly not entirely. There’s absolutely overlap, my point was more to illustrate OpenAI can and is working on multiple products simultaneously.
1 Like
Yea, sorry I almost edited to add that they can and do definitely multitask with lots of brilliant people. Was just an opportunity to mention Prafulla who I totally admire.
1 Like
Gadcuit
24
I think it should be open to users who don’t use ChatGPT4 as well. The reason is that it should already be available for all NLP.
2 Likes
I hope it’s coming sooner. I think the building blocks are ready. Dalle3 works like that, back and forth… I’m sure GPT can see what Dalle3 creates.
1 Like
j’ai utilisé la synthese vocal des smartphones pour les invites et les réponses avec l’API.une sorte de traduction rapide en texte utilisateur. mais la je pense que c’est du direct.
ce serait parfait
1 Like
Reese
27

I tried just uploading an image a day to code interpreter (“look at this!”) last week because I wasn’t sure if and how I’d know I have GPT-4V access. When it turned out it was blind GPT-4, I pasted the model card and other info from OpenAI and just asked the AI if its GPT-4V version would “basically be like CLIP with an absolutely gigantic text transformer attached to it”.
Seems like I wasn’t the only one with that idea! 
1 Like