Why is native image markup still a hurdle for GPT models? And is Open AI working on such capabilities?

TheUnholyMessiah · February 18, 2025, 8:46pm

Hi guys!

So, I posted this as a feature request exactly 1 year ago. I asked if it would be possible to provide GPT-4v the ability to mark up images. Since then, I have come to understand that GPT-4v wasn’t natively multimodal and would have required a collection of additional tools/models to help it achieve that. However, I was just wondering, given that AI models have advanced so much and that GPT-4o is natively multimodal, why is this something that is still a challenge for current AI models? And is this something that we can expect OpenAI to incorporate into its next-generation models?

As an example, last year when I asked for this markup feature to be incorporated, I didn’t quite understand how to read a humidity chart (also known as a psychrometric chart), and I was hoping instead of just telling me how to read it, GPT could show it to me by drawing lines and curves over the chart and then explaining it to me as a teacher would. Now, this would simply involve tracing over existing lines and curves, and obviously, it wasn’t something GPT-4v was capable of. But despite natively multimodal models and numerous PhD-level models being released, this still seems to be something that AI struggles with. Why is that? And is this something we can expect OpenAI to address in any of its upcoming models?

I don’t know much about how these models work. However, something as simple as tracing a line over a curve and then moving it a few notches down seems like it should be pretty straightforward, given that even a child could do it. Moreover, can developers do anything to achieve the same thing using existing models such as GPT-4o or even GPT-4o-mini through the API?

Thank you so much to everyone that takes the time to read this!

Topic		Replies	Views
Multimodal gpt-4o-mini Fine tuning API	7	2153	October 18, 2024
Inquiry About Image Analysis Capability in OpenAI API API gpt-4 , gpt-35-turbo , chatgpt , api	1	398	October 28, 2024
OpenAI has begun training its next frontier model Community in-the-news	29	4600	June 5, 2024
ChatGPT goes Multimodal! Sound and vision is rolling out on ChatGPT Community chatgpt , multimodal	34	13736	December 10, 2023
GPT-4 Photo Analysis Capabilities API gpt-4	1	4307	December 18, 2023

Why is native image markup still a hurdle for GPT models? And is Open AI working on such capabilities?

Related topics