ChatGPT goes Multimodal! Sound and vision is rolling out on ChatGPT

Foxalabs · September 25, 2023, 1:01pm

Exciting news update! ChatGPT goes multi modal!

banderson2 · September 25, 2023, 1:29pm

This does not mention API usage. When will I be able to submit images through the API?

Any information / estimates are helpful, thanks!

Foxalabs · September 25, 2023, 1:40pm

Nothing official yet, just need to be patient, I’m sure the API will follow soon.

lachie1 · September 25, 2023, 1:55pm

Very exciting news I am looking forward to speaking with ChatGPT

anon10827405 · September 25, 2023, 3:35pm

Wow. This is incredible. Although I haven’t received the update on my phone yet I can’t wait to try out some of these features. Going on hikes, spotting birds, even discussing national wonders such as Machu Pichu just got so much more interesting

It was only less than a year ago Davinci convinced me that I had to remove the brake lines on my car just so that I could remove the rotor (bad), and didn’t suggest flushing the lines before driving off (very bad). So the good ol’ mechanic test will also be interesting. Although looking at the report it seems like the model heavily leans towards “Nope, not doing that”. Which, I is fair.

I am very interested in knowing how the API will work. Will it be possible to generate and return embeddings of images? I could embed images of mushrooms for my database & determine if they are safe to eat. Start with GPT identifying what it knows and then build on top of that.

But, I am also worried by this. I really do appreciate their stance on identifying & discussing people. Using this someone could track and publish the actual whereabouts of public figures through public camera systems.

N2U · September 25, 2023, 3:45pm

So exciting, can’t wait to try this!

We’re rolling out voice and images in ChatGPT to Plus and Enterprise users over the next two weeks. Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.

banderson2 · September 25, 2023, 3:56pm

What makes you say this? To my knowledge, capabilities like web browsing were not released to the API. I understand they are very different things, just curious if you have some extra insight.

Foxalabs · September 25, 2023, 4:33pm

I would imagine people will wish to be able to include images with prompts now that ChatGPT can do it, most of the other features can already be done, and to some extent, so can multi model, but it would be nice to have a fully integrated image and text API, we’ll have to wait and see.

_j · September 25, 2023, 4:41pm

March 2023, gdb. ( No, ChatCompletions does not support submitting a list. )

grandell1234 · September 25, 2023, 6:21pm

Reading through the documentation I found that they gave beta access for Be My Eyes. I think it is amazing seeing all the ways this wonderful new technology can help people.

wwwarmy38 · September 25, 2023, 9:23pm

So now it has eyes and ears. Much closer to having actual understanding of what an apple is. Looking forward to try that asap.

N2U · September 25, 2023, 11:05pm

What do you mean? It’s definitely an ipod;

BrianLovesAI · September 26, 2023, 2:24am

I’m eagerly awaiting the API. The fact that ChatGPT is becoming multimodal is truly amazing. However, without access to the APIs, my options are limited. Therefore, my current task is to persuade my boss and colleagues that the API isn’t available yet. Often, when they come across information from OpenAI, they assume the APIs are already prepared and stable.

N2U · September 26, 2023, 2:28am

You’ll have to brace yourself for a few more weeks

Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after.

(Emphasis is mine)

anon22939549 · September 26, 2023, 6:38am

Watching the [ChatGPT can now see, hear, and speak](video about image chatting) got me thinking…

The thumbnail shows the image zoomed in with a part circled. I initially thought this was going to be from ChatGPT.

While it was still very impressive, it got me thinking—how awesome would it be if you could send ChatGPT a picture of something and it could draw on the image (circles, arrows, etc) to point things out to you…

Especially if it was able to connect in to DALL-E to produce illustrated guides.

Hell, connect it to the Internet too.

In the future I imagine a model will,

Accept the picture of the bike
Identify the bike brand and model
Locate the manual for the bike
Provide detailed and illustrated step-by-step instructions for lowering the seat including a picture and description of the required tool

In the far future maybe it’ll create a quick tutorial video where an avatar demonstrates lowering the seat on an exact copy of the bike…

supershaneski · September 26, 2023, 11:44pm

It is doable. The AI already know the position of the object in image. If you check other object/face recognition projects in the web, they usually show a bounding box around the detected parts, even in real time. But I hope if they will implement it in ChatGPT, they’ll use a scribed circle as if it is drawn by pen/marker. It would be visually pleasing that way.

tamas.simon · September 27, 2023, 12:05am

When a plugin generates an image how can the model see it?

anon22939549 · September 27, 2023, 12:07am

Oh, it’s absolutely doable but it’s another layer on top of what they’re already working on.

I don’t expect we’ll see it this year, or even maybe next.

grandell1234 · September 27, 2023, 12:13am

I agree, OpenAI’s team creates incredible programs but it also takes time. After ChatGPT4’s release they had more time to work on DALL-E and recently announced DALL-E-3. They are switching there main focus between different programs and I personally am fine with it.

anon22939549 · September 27, 2023, 12:29am

I’m pretty sure they are mostly different teams working on their own products. The underlying technologies are very different.

Topic		Replies	Views
Loving GPT-4 Image Quality — But What About API Support? API api , image-generation	6	428	April 24, 2025
API for image generation for gpt-4o model API image-generation , gpt-4o	46	51054	May 2, 2025
GPT-4: 32k and Image recognition API gpt-4	16	8952	July 20, 2023
GPT-4 API and image input API	49	72053	December 12, 2023
GPT-4 is OpenAI’s most advanced system (and it's here...) Community	36	3499	March 16, 2023

ChatGPT goes Multimodal! Sound and vision is rolling out on ChatGPT

Related topics