It would be “easy”, but not easy for gpt-4-vision. Grounding, bounding boxes, entity identification, etc is not part of the AI.
Azure, for example, can layer different vision models to perform such a task.
It would be “easy”, but not easy for gpt-4-vision. Grounding, bounding boxes, entity identification, etc is not part of the AI.
Azure, for example, can layer different vision models to perform such a task.