I am testing out ChatGPT’s object classification capability, and found that sometimes it mis-classifies due to contextual information.
For example, given this image:
Prompt #1: identify the large brown animal in the image
Response #1: it is a dog (which is wrong)
Given the same image but with the contextual information masked out:
Prompt #2: identify the large brown animal in the image
Response #2: it is a cougar
I want to get ChatGPT to respond correctly without doing masking on the image, so this is what I did:
Prompt: identify the large brown animal in the image
Response: it is a dog
Prompt: It is actually a cougar. Now give me suitable prompt for GPT4 that are more likely to get the correct identification
Response: To increase the likelihood of correct animal identification using GPT-4’s capabilities, one should provide a detailed description that focuses on key distinguishing features. Here’s a suitable prompt:
“Analyze the animal in the image provided, taking into consideration its size, body shape, and any visible features. The animal seems to be of a large size with a muscular build, indicative of a wild feline. Its fur is a uniform color, without the varied patterning commonly seen in domestic animals. The color of the fur appears to be a light tan or beige, which can blend with natural surroundings. Note any characteristics that may be associated with wild felines, such as the shape of the head, the position and size of the ears, the length and shape of the tail, and the general stance and musculature. These characteristics can help in distinguishing this animal from domestic species and can point towards a wild cat, possibly a cougar. Assess strictly based on these physical attributes, disregarding any contextual information that may lead to assumptions about domesticity.”
Prompt: Analyze the animal in the image provided, taking into consideration its size, body shape, and any visible features. Assess strictly based on these physical attributes, disregarding any contextual information that may lead to assumptions about domesticity.
Response: The animal depicted appears to be a large-sized feline with a robust body, relatively long legs, and a sizable head. Its coat is uniformly colored, suggesting it may be a young adult or adult specimen. This animal’s size and body shape are consistent with those of a wild feline, likely a cougar or mountain lion. Cougars are known for their plain tan to light cinnamon color, which helps them blend into a variety of landscapes. Visible features such as the muscular build, rounded ears, and the absence of spots or stripes on the fur further support this assessment.
The above shows that it is possible to point out GPT-4V’s mistake and then get it to help with coming up with a better prompt.
I think this process can be semi-automated. That is, once we get correction from user we can then generate a new prompt that works better (hopefully). This gives us a process to make the system perform better incrementally without having to fine-tuning the underlying model.
I’d love to hear from the community on whether this is going to work.