Reading through the documentation I found that they gave beta access for Be My Eyes. I think it is amazing seeing all the ways this wonderful new technology can help people.

1 Like

So now it has eyes and ears. Much closer to having actual understanding of what an apple is. Looking forward to try that asap.

3 Likes

What do you mean? It’s definitely an ipod;

7 Likes

I’m eagerly awaiting the API. The fact that ChatGPT is becoming multimodal is truly amazing. However, without access to the APIs, my options are limited. Therefore, my current task is to persuade my boss and colleagues that the API isn’t available yet. Often, when they come across information from OpenAI, they assume the APIs are already prepared and stable. :sweat_smile:

1 Like

You’ll have to brace yourself for a few more weeks :wink:

Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after.

(Emphasis is mine)

2 Likes

Watching the [ChatGPT can now see, hear, and speak](video about image chatting) got me thinking…

The thumbnail shows the image zoomed in with a part circled. I initially thought this was going to be from ChatGPT.

While it was still very impressive, it got me thinking—how awesome would it be if you could send ChatGPT a picture of something and it could draw on the image (circles, arrows, etc) to point things out to you…

Especially if it was able to connect in to DALL-E to produce illustrated guides.

Hell, connect it to the Internet too.

In the future I imagine a model will,

  • Accept the picture of the bike
  • Identify the bike brand and model
  • Locate the manual for the bike
  • Provide detailed and illustrated step-by-step instructions for lowering the seat including a picture and description of the required tool

In the far future maybe it’ll create a quick tutorial video where an avatar demonstrates lowering the seat on an exact copy of the bike…

2 Likes

It is doable. The AI already know the position of the object in image. If you check other object/face recognition projects in the web, they usually show a bounding box around the detected parts, even in real time. But I hope if they will implement it in ChatGPT, they’ll use a scribed circle as if it is drawn by pen/marker. It would be visually pleasing that way.

1 Like

When a plugin generates an image how can the model see it?

1 Like

Oh, it’s absolutely doable but it’s another layer on top of what they’re already working on.

I don’t expect we’ll see it this year, or even maybe next.

1 Like

I agree, OpenAI’s team creates incredible programs but it also takes time. After ChatGPT4’s release they had more time to work on DALL-E and recently announced DALL-E-3. They are switching there main focus between different programs and I personally am fine with it.

1 Like

I’m pretty sure they are mostly different teams working on their own products. The underlying technologies are very different.

1 Like

I’ve been following Prafulla Dhariwal on twitter since the “Jukebox” days (2020). He’s definitely involved w/ multiple projects.

https://twitter.com/prafdhar

Bio, " Co-creator of GPT-3, DALL-E 2, Jukebox, Glow, PPO. Researcher"

1 Like

Do note I wrote mostly not entirely. There’s absolutely overlap, my point was more to illustrate OpenAI can and is working on multiple products simultaneously.

1 Like

Yea, sorry I almost edited to add that they can and do definitely multitask with lots of brilliant people. Was just an opportunity to mention Prafulla who I totally admire.

1 Like

I think it should be open to users who don’t use ChatGPT4 as well. The reason is that it should already be available for all NLP.

2 Likes

I hope it’s coming sooner. I think the building blocks are ready. Dalle3 works like that, back and forth… I’m sure GPT can see what Dalle3 creates.

1 Like

j’ai utilisé la synthese vocal des smartphones pour les invites et les réponses avec l’API.une sorte de traduction rapide en texte utilisateur. mais la je pense que c’est du direct.
ce serait parfait

1 Like

:joy:

I tried just uploading an image a day to code interpreter (“look at this!”) last week because I wasn’t sure if and how I’d know I have GPT-4V access. When it turned out it was blind GPT-4, I pasted the model card and other info from OpenAI and just asked the AI if its GPT-4V version would “basically be like CLIP with an absolutely gigantic text transformer attached to it”.

Seems like I wasn’t the only one with that idea! :smile_cat:

1 Like

Don’t ask the AI. It won’t know.

1 Like

Gotta insist that the AI knows, the the AI can, and that the AI is able to. :grin:

Granted, Bing is, uh, special - it twisted that approach by suggesting prompts for you to use, and if you tap them, it will say "I’m sorry but I prefer not to continue this conversation :pray: ".

1 Like