Improving performance of a product categorizer (prompting & fine-tuning?)

nicolas12 · August 27, 2024, 9:07am

Hi everyone, I’m new to this forum. I’m using the OpenAI API for some months now, to build some eCommerce-related tools.

I have for instance a “product categorizer” that looks at product/offer titles provided by online retailers, and extract product types.

So far I’m using ChatGPT3.5 in JSON mode which works ok, and have the following prompt:

f’''You are an expert optimizing product feeds for Google Ads. Your task is to extract product types from a product title, and determine if the product is a {categoryName} or an accessory for a {categoryName} or something else.

Return a JSON format with values ‘correctly_categorized’ and ‘product_types’.

If product types are found, list a maximum of two product types, translate them in english and only return the english values.
If the product title is a {categoryName}, returns correctly_categorized=1. If it is an accessory or something else, return 0.

Product title: “{product_title}” ‘’’

Here are a few output for the Tumble Dryer category:

#7 - ‘HAIER Asciugatrice Pompa di calore Libera Installazione 9 Kg Classe a+++ Motore inverter HD90-A3939’ - correctly_categorized: ‘1’ - Heat Pump Tumble Dryer;Inverter Motor Tumble Dryer

#8 - ‘HAIER I-Pro Series 7 HD90-A2979 Heat Pump Tumble Dryer - White’ - correctly_categorized: ‘1’ - Tumble Dryer

#9 - ‘Sharp KD-GCB8S7PW9 - Sèche-linge 8 kg à condensation’ - correctly_categorized: ‘1’ - Tumble Dryer

#10 - ‘SiemClean Air Plus Umluftset LZ21WWI16’ - correctly_categorized: ‘0’ - Air recycling set

#11 - ‘Candy S?che linge Condensation TOEH8A2DEM-S’ - correctly_categorized: ‘1’ - Tumble Dryer

#12 - ‘Siemens WT47XMS1 iQ700 Heat Pump Dryer / 8 kg / A+++ / 176 kWh / Intelligent Cleaning System / AutoDry / Outdoor Drying Program’ - correctly_categorized: ‘1’ - Heat Pump Dryer;Tumble Dryer

#13 - ‘RECAMANIA Resistencia secadora whirpool awz 2500w’ - correctly_categorized: ‘0’ - Heater;Tumble Dryer accessory

#14 - ‘Hoover Asciugatrice Ndp4 H7a2tcbex-s 7 Kg Classe A+±bianco’ - correctly_categorized: ‘1’ - Tumble Dryer

#15 - ‘Samsung DV90T5240AT, Suszarka z technologią OptimalDry, 9 kg, biała - Biały - Size: 9 kg’ - correctly_categorized: ‘1’ - Tumble Dryer

#16 - ‘Samsung DV5000 Heat Pump Tumble Dryer A++, 9kg in Silver (DV90TA040AX/EU)’ - correctly_categorized: ‘1’ - Tumble Dryer

My questions are the following:

I still haven’t understood the difference between {“role”: “system”} and
{“role”: “user”}. I’ve tried splitting my prompt into “system” and “user”, without an impact on the results. Is there a way to improve my prompt by splitting between “system” and “user”?
Could this use case benefit from Fine-tuning?
Thanks for your help,
Nicolas

jr.2509 · August 27, 2024, 10:18am

Welcome to the Forum!

The main purpose of the system message is to steer the overall model behaviour and to provide with the general instructions on how to perform an assigned task. In your specific case it would make sense to include your main instructions for the approach to classification into the system message while then placing the individual product titles in the user message.

If you find that the results you are getting are not accurate, then this could indeed be a candidate for fine-tuning. Note that both for the fine-tuning and later during inference you would still need to apply the category names to select from - the model would not learn these categories during the fine-tuning process. The fine-tuning mainly would serve to achieve a higher accuracy in the extraction of the product types and the categorization.

Let us know if you have more specific questions.

Topic		Replies	Views
Fine-tuning dataset : system, user and assistant content : where to put the real instructions? API fine-tuning	1	978	December 29, 2023
Optimizing System Prompts for fine tuning Prompting fine-tuning	2	532	March 20, 2024
How to further improve Product Categorization Task? Prompting chatgpt	4	1233	June 11, 2024
Help with fine-tuning for text categorization API	4	1290	December 16, 2023
Different roles in the API and their use cases API	1	9172	April 16, 2024

Improving performance of a product categorizer (prompting & fine-tuning?)

Related topics