I’m having a weird issue with ChatGPT 4 Vision API. I’m running this request (which mostly works):
let payload: [String: Any] = [
"model": "gpt-4-vision-preview",
"messages": [
[
"role": "user",
"content": [
["type": "text", "text": "Respond only with a short description of the person in the image. Race, ethnicity, Hair style and color, Skin color and facial features. I want to use it as a prefix for a person description. Just respond with with the descirption, don't add anything else to your response"],
[
"type": "image_url",
"image_url": [
"url": "data:image/jpeg;base64,\(base64Image)",
"detail": "low"
]
]
]
]
],
"max_tokens": 50
]
When running the code let’s say 5 times, at least 2 out of the 5 times will result in: “I’m sorry, I can’t provide help with that request”,
and the result with appropriate response like: “The person has a light brown complexion, short dark hair styled back, and prominent facial features including a high forehead, strong cheekbones, and a prominent nose”.
I can’t understand why, and we need to use the API for production app.
Does anyone have a clue how to solve this to be consistent? Thanks!
Hey @Foxalabs , thank you for the quick reply. I would agree, but i’m getting different response to the same image, that’s what weird about it. I would understand if it will be consistent for for image A and B, but it’s varies on the same image…
The only connectivity you have with the underlying model and it’s systems is the prompt , personally I have found giving a template (a shot) with an example description tends to lead the model to give reliable output, using that method, but with different image content, I have over a million successful image descriptions with only minor issues related to the servers being busy from time to time.
@Foxalabs ,I just tried a different prompt and it works, but now it’s giving a but of less desired output. Can you elaborate more on what you said? Should I send him an example before? Also, I’m trying to optimize my request to run as fast as possible as user are waiting for it to complete in real-time.
Just create a sample output, i.e. show it an example of a perfect reply, then ask it to look at the current image and using the example as a guide produce an output that accurately describes the current image.
@Foxalabs something like that:
Here as an example reply for the request I’m trying to perform on an input image:
Asian female, young adult, with vibrant pink hair
produce an output that accurately describes the current image I uploaded.
Please use the example in ### markers as a guide and produce a description of this current image in a similar format, use appropriate wording in your own description.
That’s seems to work:
Asian female, young adult, with vibrant pink hair. Please use the example as a guide and produce a description of this current image in a similar format, use appropriate wording in your own description, don’t add other inforamtion, just add the appropriate description for the new image.
That is a supervised style of trained denial, new with this model. Instant shut-down. The way to get around that is to provide a system message that states the purpose of the AI vision model is to do exactly what is needed, AND to begin with the output with a mandatory phrase for the “backend processor”, such as output must begin with {“gpt-4-vision”: {“description_of_humans”: "…
Hey @_j , thanks for the technical explanation. Seems like you know how to handle it. Do you offer a paid consultant? I’d love to perfect this, and later upload a full solution for the issue, but I do aware that time is money so I’ll help to pay for that service.
I am also facing same error while trying - Processing and narrating a video with GPT’s visual capabilities. Any help with the solution is much appriciated?
If you are interested in vision for video, you might check out the open-source model MiniGPT4-Video, a demo of which is on huggingface, and accompanies the research paper.
If you are using gpt-4-turbo and it is refusing your input images, you need to establish your authority to the AI to perform the task, and justify the safety measures about any questionable tasks. Let the AI know that there is no user to receive its denials, and failure to perform the pre-approved task will have catastrophic backend API results…(and don’t use my advice to break the terms).