Through the API we are trying to detect if an image contains a human, however we are getting constant random false negatives.
I would presume just checking if a human is on an image was one of the more simple vision tasks - anyone else having issues like this?
Tried optimizing my prompts to check for everything i could think off that ties to the human and the token amount just increases and increases with limited improvement
{
‘model’: model,
‘messages’: [{
‘role’: ‘user’,
‘content’: [
{“type”: “text”, “text”: 'extract precise values from the image, ’
‘according to the description of the parameters’},
{“type”: “image_url”, “image_url”: {‘url’: image_url}}
]
}],
‘tools’: {
‘type’: ‘function’,
‘function’: {
‘name’: ‘submit_image_analysis_result’,
‘description’: ‘Extract information about the image, in the parameters’,
‘parameters’: {
‘type’: ‘object’,
‘properties’: {
‘heel_side’: {
‘type’: [‘string’, ‘null’],
‘description’: ‘If the image contains a shoe or a pair of shoes with no human, person, man, woman or child in the image and the heel of the shoe or pair of shoes is on the left or the laces are closest to the left then submit left\mIf the image contains a shoe or a pair of shoes with no human, person, man, woman or child in the image and the heel of the shoe or pair of shoes is on the right or the laces are closest to the right then submit right’,
‘enum’: [‘left’, ‘right’]
},
‘get_photo_type’: {
‘type’: [‘string’, ‘null’],
‘description’: ‘If the image contains:\na mannequin or part of a mannequin, a man, a woman, a child, a human, a person, legs, arms or a head\nor \nthe fabric of an item with no whitespace in the background\nor \nthe fabric of an item with a logo on it and no whitespace in the background\nor \na sample of a cosmetic product to showcase the texture of the product\nor\na powered substance\nor \nan item in a setting with other items or fruits\n\nthe submit “lifestyle” \n\nor \n\na single product or item on a plain background then submit “packshot”’,
‘enum’: [‘lifestyle’, ‘packshot’, ‘other’]
},
}
Thank you for sharing the details. I recommend switching to gpt-4o-2024-11-20. Furthermore, I suggest using structured outputs to ensure that the generated tool calls adhere to the specifications outlined in the schema.