How to tune GPT-4V's image classification performance without model fine-tuning

kaihuchen · March 19, 2024, 8:22pm

I am testing out ChatGPT’s object classification capability, and found that sometimes it mis-classifies due to contextual information.

For example, given this image:

Prompt #1: identify the large brown animal in the image
Response #1: it is a dog (which is wrong)

Given the same image but with the contextual information masked out:

Prompt #2: identify the large brown animal in the image
Response #2: it is a cougar

I want to get ChatGPT to respond correctly without doing masking on the image, so this is what I did:

Prompt: identify the large brown animal in the image
Response: it is a dog
Prompt: It is actually a cougar. Now give me suitable prompt for GPT4 that are more likely to get the correct identification
Response: To increase the likelihood of correct animal identification using GPT-4’s capabilities, one should provide a detailed description that focuses on key distinguishing features. Here’s a suitable prompt:
“Analyze the animal in the image provided, taking into consideration its size, body shape, and any visible features. The animal seems to be of a large size with a muscular build, indicative of a wild feline. Its fur is a uniform color, without the varied patterning commonly seen in domestic animals. The color of the fur appears to be a light tan or beige, which can blend with natural surroundings. Note any characteristics that may be associated with wild felines, such as the shape of the head, the position and size of the ears, the length and shape of the tail, and the general stance and musculature. These characteristics can help in distinguishing this animal from domestic species and can point towards a wild cat, possibly a cougar. Assess strictly based on these physical attributes, disregarding any contextual information that may lead to assumptions about domesticity.”
Prompt: Analyze the animal in the image provided, taking into consideration its size, body shape, and any visible features. Assess strictly based on these physical attributes, disregarding any contextual information that may lead to assumptions about domesticity.
Response: The animal depicted appears to be a large-sized feline with a robust body, relatively long legs, and a sizable head. Its coat is uniformly colored, suggesting it may be a young adult or adult specimen. This animal’s size and body shape are consistent with those of a wild feline, likely a cougar or mountain lion. Cougars are known for their plain tan to light cinnamon color, which helps them blend into a variety of landscapes. Visible features such as the muscular build, rounded ears, and the absence of spots or stripes on the fur further support this assessment.

The above shows that it is possible to point out GPT-4V’s mistake and then get it to help with coming up with a better prompt.

I think this process can be semi-automated. That is, once we get correction from user we can then generate a new prompt that works better (hopefully). This gives us a process to make the system perform better incrementally without having to fine-tuning the underlying model.

I’d love to hear from the community on whether this is going to work.

gpt4 Prompting

curt.kennedy · March 20, 2024, 4:57am

With image classification problems, I’ve been having recent success with image embedding engines. It can be a bit of a grind, but you embed and label many images, and average the labels of the top embedding correlations for your answer.

Essentially you are exploiting the fact that embedding is a continuous function.

Topic		Replies	Views
Challenges with GPTs Image Classification: Seeking Solutions Prompting chatgpt , gpts	2	1043	December 16, 2023
Image mapping with prompts API gpt-4 , chatgpt , gpt-4-vision	1	861	July 19, 2024
Flip-flopping GPT-4V when used for level 5 autonomous driving Prompting gpt-4	12	1045	March 5, 2024
How to add correct examples for image-to-text task Prompting gpt-4-vision	5	2187	December 29, 2023
How to further improve Product Categorization Task? Prompting chatgpt	4	1235	June 11, 2024

How to tune GPT-4V's image classification performance without model fine-tuning

Related topics