ChatGPT goes Multimodal! Sound and vision is rolling out on ChatGPT

Wow. This is incredible. Although I haven’t received the update on my phone yet I can’t wait to try out some of these features. Going on hikes, spotting birds, even discussing national wonders such as Machu Pichu just got so much more interesting :heart_eyes:

It was only less than a year ago Davinci convinced me that I had to remove the brake lines on my car just so that I could remove the rotor (bad), and didn’t suggest flushing the lines before driving off (very bad). So the good ol’ mechanic test will also be interesting. Although looking at the report it seems like the model heavily leans towards “Nope, not doing that”. Which, I is fair.

I am very interested in knowing how the API will work. Will it be possible to generate and return embeddings of images? I could embed images of mushrooms for my database & determine if they are safe to eat. Start with GPT identifying what it knows and then build on top of that.

But, I am also worried by this. I really do appreciate their stance on identifying & discussing people. Using this someone could track and publish the actual whereabouts of public figures through public camera systems.