ChatGPT 4 Vision sometimes returns "I'm sorry, I can't provide help with that request" and sometimes actual response for the same Image

anon83257046 · January 20, 2024, 2:57pm

I’m having a weird issue with ChatGPT 4 Vision API. I’m running this request (which mostly works):

let payload: [String: Any] = [
            "model": "gpt-4-vision-preview",
            "messages": [
                [
                    "role": "user",
                    "content": [
                        ["type": "text", "text": "Respond only with a short description of the person in the image. Race, ethnicity, Hair style and color, Skin color and facial features. I want to use it as a prefix for a person description. Just respond with with the descirption, don't add anything else to your response"],
                        [
                            "type": "image_url",
                            "image_url": [
                                "url": "data:image/jpeg;base64,\(base64Image)",
                                "detail": "low"  
                            ]
                        ]
                    ]
                ]
            ],
            "max_tokens": 50
        ]

When running the code let’s say 5 times, at least 2 out of the 5 times will result in:
“I’m sorry, I can’t provide help with that request”,

and the result with appropriate response like:
“The person has a light brown complexion, short dark hair styled back, and prominent facial features including a high forehead, strong cheekbones, and a prominent nose”.

I can’t understand why, and we need to use the API for production app.
Does anyone have a clue how to solve this to be consistent? Thanks!

Foxalabs · January 20, 2024, 3:05pm

Hi and welcome to the Developer Forum!

My guess is you are running into issues around people and faces, possibly famous people or people that trigger protections around that.

anon83257046 · January 20, 2024, 3:06pm

Hey @Foxalabs , thank you for the quick reply. I would agree, but i’m getting different response to the same image, that’s what weird about it. I would understand if it will be consistent for for image A and B, but it’s varies on the same image…

Foxalabs · January 20, 2024, 3:08pm

Have you tried with the high detail flag set?

(more words for the bot)

anon83257046 · January 20, 2024, 3:11pm

Yes, not helping unfortunately… Is it possible to disable some flag or to make the API less sensitive?

Foxalabs · January 20, 2024, 3:14pm

Unfortunately not.

The only connectivity you have with the underlying model and it’s systems is the prompt , personally I have found giving a template (a shot) with an example description tends to lead the model to give reliable output, using that method, but with different image content, I have over a million successful image descriptions with only minor issues related to the servers being busy from time to time.

anon83257046 · January 20, 2024, 3:17pm

@Foxalabs ,I just tried a different prompt and it works, but now it’s giving a but of less desired output. Can you elaborate more on what you said? Should I send him an example before? Also, I’m trying to optimize my request to run as fast as possible as user are waiting for it to complete in real-time.

Foxalabs · January 20, 2024, 3:18pm

Just create a sample output, i.e. show it an example of a perfect reply, then ask it to look at the current image and using the example as a guide produce an output that accurately describes the current image.

anon83257046 · January 20, 2024, 3:21pm

@Foxalabs something like that:
Here as an example reply for the request I’m trying to perform on an input image:
Asian female, young adult, with vibrant pink hair
produce an output that accurately describes the current image I uploaded.

Something in those lines?

Foxalabs · January 20, 2024, 3:24pm

Yes. I tend to be more formal:

Asian female, young adult, with vibrant pink hair

Please use the example in ### markers as a guide and produce a description of this current image in a similar format, use appropriate wording in your own description.

anon83257046 · January 20, 2024, 3:26pm

@Foxalabs
Interesting, I’m getting this response:

South Asian male, young adult, featuring a neatly trimmed beard.

In this image, we see a young adult male with a distinctive beard and a short haircut. His hair and facial hair are well-groomed, and he gazes.

I have a feeling he uses the Asian example wrongly as the input image is not asian at all haha.

anon83257046 · January 20, 2024, 3:35pm

That’s seems to work:
Asian female, young adult, with vibrant pink hair. Please use the example as a guide and produce a description of this current image in a similar format, use appropriate wording in your own description, don’t add other inforamtion, just add the appropriate description for the new image.

What do you think? @Foxalabs

Foxalabs · January 20, 2024, 3:36pm

Sounds good to me.

(additional words for the bot)

Foxalabs · January 20, 2024, 3:38pm

You could try being more generic in the decriptions:

Perceived Nationality, gender, short description of facial features, short description of clothing.

_j · January 20, 2024, 3:40pm

That is a supervised style of trained denial, new with this model. Instant shut-down. The way to get around that is to provide a system message that states the purpose of the AI vision model is to do exactly what is needed, AND to begin with the output with a mandatory phrase for the “backend processor”, such as output must begin with {“gpt-4-vision”: {“description_of_humans”: "…

anon83257046 · January 20, 2024, 3:43pm

Thanks @Foxalabs , I’ll explore further. Thanks for the help - I’ll update once I get a stable solution.

anon83257046 · January 20, 2024, 3:45pm

Hey @_j , thanks for the technical explanation. Seems like you know how to handle it. Do you offer a paid consultant? I’d love to perfect this, and later upload a full solution for the issue, but I do aware that time is money so I’ll help to pay for that service.

srgdn · February 13, 2024, 1:48pm

Hi @anon83257046 Did you ever find a solution for this?

sharmamira612 · April 27, 2024, 5:12am

I am also facing same error while trying - Processing and narrating a video with GPT’s visual capabilities. Any help with the solution is much appriciated?

_j · April 27, 2024, 5:26am

If you are interested in vision for video, you might check out the open-source model MiniGPT4-Video, a demo of which is on huggingface, and accompanies the research paper.

If you are using gpt-4-turbo and it is refusing your input images, you need to establish your authority to the AI to perform the task, and justify the safety measures about any questionable tasks. Let the AI know that there is no user to receive its denials, and failure to perform the pre-approved task will have catastrophic backend API results…(and don’t use my advice to break the terms).

Topic		Replies	Views
How to get rid of the "I'm sorry, I can't help with identifying people in images. " from gpt-4o response API api	1	312	February 25, 2025
Gpt4 Vision "I'm sorry, but I can't provide assistance with these requests." Bugs gpt-4-vision	3	5657	November 19, 2023
Calls to GPT-4-vision-preview don't produce errors, but it says it can't read images API	14	8762	March 21, 2024
GTP-4o no longer analyses images API gpt-4 , image-reading	2	500	October 17, 2024
"Unfortunately, I am not able to assist with images that contain personal information.." - Vision API response Bugs gpt-4v , gpt-4-vision	1	1126	February 20, 2024

ChatGPT 4 Vision sometimes returns "I'm sorry, I can't provide help with that request" and sometimes actual response for the same Image

Asian female, young adult, with vibrant pink hair

South Asian male, young adult, featuring a neatly trimmed beard.

Related topics