ChatGPT goes Multimodal! Sound and vision is rolling out on ChatGPT

anon10827405 · September 25, 2023, 3:35pm

Wow. This is incredible. Although I haven’t received the update on my phone yet I can’t wait to try out some of these features. Going on hikes, spotting birds, even discussing national wonders such as Machu Pichu just got so much more interesting

It was only less than a year ago Davinci convinced me that I had to remove the brake lines on my car just so that I could remove the rotor (bad), and didn’t suggest flushing the lines before driving off (very bad). So the good ol’ mechanic test will also be interesting. Although looking at the report it seems like the model heavily leans towards “Nope, not doing that”. Which, I is fair.

I am very interested in knowing how the API will work. Will it be possible to generate and return embeddings of images? I could embed images of mushrooms for my database & determine if they are safe to eat. Start with GPT identifying what it knows and then build on top of that.

But, I am also worried by this. I really do appreciate their stance on identifying & discussing people. Using this someone could track and publish the actual whereabouts of public figures through public camera systems.

Topic		Replies	Views
Introducing gpt-image-2 - available today in the API and Codex Announcements	16	11042	June 8, 2026
Loving GPT-4 Image Quality — But What About API Support? API api , image-generation	6	682	April 24, 2025
GPT-4: 32k and Image recognition API gpt-4	16	9302	July 20, 2023
GPT-4 API and image input API	48	73090	November 20, 2023
GPT-4 is OpenAI’s most advanced system (and it's here...) Community	36	4016	March 16, 2023

ChatGPT goes Multimodal! Sound and vision is rolling out on ChatGPT

Related topics