GPT for Human pose estimation

Hi, has anyone tried to use the vision capabilities to perform Human pose estimation? Traditionally, this is usually done with models like OpenPose, Mediapipe, PoseNet, etc and involves taking in an person image/video and identifying body landmarks like shoulders, hips, ankles, knees, face, etc. I have been looking to see if the vision capabilites of open a are able to perform this but if you have tried it please do let me know :wink: