Challenges with GPTs Image Classification: Seeking Solutions

yangxy0618 · December 15, 2023, 4:14pm

Hello everyone,

I’ve been working on an image classification project using GPT models and have run into a significant issue. Despite creating detailed prompts, the GPT models are not categorizing images according to my predefined standards. Instead, they often generate a similar image or fail to classify the uploaded images correctly. I’m looking for insights on why this is happening and how I can improve the process.

Here is my specific prompt, which I’ve used for guidance:

1. Read four CSV files from a knowledge base, each representing a different classification system. These files contain various analytical indicators for labeling images or textual content. Note: CSVs do not have column names.
2. For images, return the 10 most suitable tags: 3 from the first CSV (covering broad themes like cuisine, parenting, travel, home decor, etc.), 3 from the second CSV (focused on specific objects or concepts like animals, landmarks, food, beverages, furniture, tourist spots), 2 from the third CSV (covering emotions, styles, and themes like joy, sadness, romance, adventure, calmness), and 2 from the fourth CSV (specific features of images, e.g., ‘with/without a clear face shot’ and types of textual content).
3. For textual content, return 10 tags: 4 from the first CSV, 3 from the second CSV, and 3 from the third CSV. No tags from the fourth CSV are needed for text.
4. The fourth category includes tags like ‘with clear face shot’ or ‘without clear face shot’ for images, and ‘textual content’ or ‘image content’ based on the length of the text in the image.
5. Provide a 50-60 word description of the image.
6. The purpose of these tags and descriptions is to match content for marketing posts and images on the Xiaohongshu platform.
7. When a user uploads a file and says “analyze the uploaded file,” the classification and description should strictly follow these instructions, based solely on the four CSV documents from the knowledge base.
8. The final output should be a JSON-formatted list of 10 tag classifications and a text description, with no additional actions required.

_j · December 15, 2023, 4:18pm

I would rearrange these. The proficiency is in describing images; going right into classification tasks is uncharted territory, so you can first ask for a full description of everything seen in the image, including description of bounding boxes and then percentage occupied by objects or themes.

From there, then you can work on language task output in the same response.

yangxy0618 · December 16, 2023, 3:48am

great thanks!

and i also tested, i asked gpt to describe the image then classify it, and i have a better result

Topic		Replies	Views
How to tune GPT-4V's image classification performance without model fine-tuning Prompting gpt-4	1	772	March 20, 2024
Image recognition: looking for advice Prompting gpt-4 , chatgpt , image-reading , chat-with-images	0	761	March 1, 2024
Image mapping with prompts API gpt-4 , chatgpt , gpt-4-vision	1	703	July 19, 2024
Seeking Solutions for Instability in Multi-Class Labeling Tasks Prompting api , classification , research	8	1016	November 15, 2023
Using images as context in prompt Prompting gpt-4	5	2525	April 29, 2024

Challenges with GPTs Image Classification: Seeking Solutions

Related topics