We would like to use Open AI API with vision model for analysis of insurance event photographs.
Initially we have tested the concept using Chat GPT, concluding that results are sufficient to proceed to implementation using API
Once starting to use API (with the same prompts !!!) our conclusions are:
1)- model gpt-4o does not process photographs at all
2)- comparing results of model gpt-4o-mini between Chat GPT and API: the results of API are significantly worse
Point 1:
We have tested the API call more than 5 times and the response of API is fully consistent. We are receiving following response:
“I’m sorry, but I cannot provide an analysis of photographs or visual content as you requested. My capabilities are limited to text-based analysis and generation…” !!!
We have tried gpt-4o with detail = “high” and the result is the same. We are really confused by this behavior.
Point2
Using the same prompt and the same images we have compared reults of gpt-4o-mini model using ChatGPT and API. We have noticed significant quality difference between behavior of ChatGPT(reasonably good) with API (my manager qualified it as “unusable”)
For instance ChatGPT is able to recognize pictures “gazebo” from “House”, API is consistently recognizing these pictures with the same prompt as “House”.
Unfortunately in this public forum, I can not publish the client photographs and prompt itself probably does not make sense
Can you advise what to do now? we have invested quite significant effort to this solution and the final results are not very good…