Detecting if an image contains a human fails randomly?

Through the API we are trying to detect if an image contains a human, however we are getting constant random false negatives.

I would presume just checking if a human is on an image was one of the more simple vision tasks - anyone else having issues like this?

Tried optimizing my prompts to check for everything i could think off that ties to the human and the token amount just increases and increases with limited improvement

Note that the images purely contains a human with clothes on - so they are not hidden in any way. It’s the sole focus of the image.


Example included

Hello @sm4,

This task is relatively straightforward. Could you please confirm the model name you are using and provide a sample of the API call you are making?

Using the model gpt-4-turbo

One of our calls are like this:

POST /v1/chat/completions

{
‘model’: model,
‘messages’: [{
‘role’: ‘user’,
‘content’: [
{“type”: “text”, “text”: 'extract precise values from the image, ’
‘according to the description of the parameters’},
{“type”: “image_url”, “image_url”: {‘url’: image_url}}
]
}],
‘tools’: {
‘type’: ‘function’,
‘function’: {
‘name’: ‘submit_image_analysis_result’,
‘description’: ‘Extract information about the image, in the parameters’,
‘parameters’: {
‘type’: ‘object’,
‘properties’: {
‘heel_side’: {
‘type’: [‘string’, ‘null’],
‘description’: ‘If the image contains a shoe or a pair of shoes with no human, person, man, woman or child in the image and the heel of the shoe or pair of shoes is on the left or the laces are closest to the left then submit left\mIf the image contains a shoe or a pair of shoes with no human, person, man, woman or child in the image and the heel of the shoe or pair of shoes is on the right or the laces are closest to the right then submit right’,
‘enum’: [‘left’, ‘right’]
},
‘get_photo_type’: {
‘type’: [‘string’, ‘null’],
‘description’: ‘If the image contains:\na mannequin or part of a mannequin, a man, a woman, a child, a human, a person, legs, arms or a head\nor \nthe fabric of an item with no whitespace in the background\nor \nthe fabric of an item with a logo on it and no whitespace in the background\nor \na sample of a cosmetic product to showcase the texture of the product\nor\na powered substance\nor \nan item in a setting with other items or fruits\n\nthe submit “lifestyle” \n\nor \n\na single product or item on a plain background then submit “packshot”’,
‘enum’: [‘lifestyle’, ‘packshot’, ‘other’]
},
}

1 Like

Thank you for sharing the details. I recommend switching to gpt-4o-2024-11-20. Furthermore, I suggest using structured outputs to ensure that the generated tool calls adhere to the specifications outlined in the schema.

4 Likes

We will try and implement as suggested to see if it makes a difference!

Will report back if you helped us maintain a few hairstrings on our head :slight_smile:

1 Like

We still seem to fail a match on your suggestion with the prompt:

If the image contains a mannequin, a man, a woman or a child wearing clothing then return “lifestyle”

We get random images returned with null instead :frowning:

One of the images giving null: