Hi everyone,
I’ve been experimenting with image analysis using OpenAI models, and I noticed that most existing discussions about this topic are quite old.
I’m particularly curious about the current performance, cost, and reliability differences between GPT-4o and GPT-4o Mini for tasks like:
-
Menu parsing / text extraction from images
-
Object recognition or classification
-
Image captioning
Has anyone tried both recently? Which one would you recommend in terms of:
-
Accuracy / comprehension of image content
-
Cost efficiency
-
Speed and API responsiveness
Also, are there any newer multimodal models or tips for achieving better structured outputs from images?
Would love to hear your real-world experiences and benchmarks. Thanks!