Prompt to classify merchants based on a list of available categories

pedmorsou · September 19, 2023, 8:05pm

I am using OPEN AI playground with model gpt 3.5 turbo, for the classification of merchants.
I provide as input merchant name, and a list of available categories.

I want the response to include the most likely category from the available categories, but often I get as response a category that does not exist.

e.g. merchant name: Starbucks
Category available: Breakfast, Lunch, Dinner
The suggested category could be “Coffee shop”

How can I avoid it?

Here’s my system prompt

You are a system that automatically categorise merchants. You will receive as an input a properly formatted JSON.

JSON contain ''merchant name and “available categories”.

Based on the merchant name you will select the most suitable category based on the “available categories”.

Result should include only the category in this format:

{“most suitable category”}.

You must select only one of the ''available categories" provided in JSON. You cannot include in the response a category that is not part of JSON.

User prompt
{
“merchant name”: “Amazon Prime”,
“available categories”: [“Air travel”,
“Fuel costs”,
“Gifts”,
“Insurance”,
“Internet”,
“Lodging”,
“Meals and entertainment”,
“Mileages”,
“Other”,
“Parking”,
“public transportation”,
“Taxi”,
“Tools and small equipment”,
“Train”,
“Trainings”,
]
}

Any help is appreciated on how I could improve and avoid the response to include non-existing categories.

Foxalabs · September 19, 2023, 9:26pm

Hi and welcome to the Developer Forum!

“Given these categories ###{category_list}### pick one that best classifies this Merchant name {merchant name}”

You can make the category_list json if you like, or csv or any standard list format the model would know, seems to work without issues in my admittedly short test.

(Using Python)

wclayf · September 20, 2023, 12:08am

If my understanding of “Embeddings” is correct, you might be able to also use a Vector Database (and cosine similarity) to approach this problem and not even need to make a GPT call, other than once to get each embedding. This is because you are basically describing a ‘semantic similarity’ problem here (sort of), and theoretically the embedding vector for “Starbucks” for example would be semantically closer to “Coffee Shop” than “Dinner”.

Topic		Replies	Views
Classification model - GPT 3.5 API - Best way to pass the labels (CSV or JSON format?) API gpt-35-turbo , classification , prompt	5	2102	January 29, 2024
Force GPT 3.5 Turbo to choose an answer from a set of predefined options API	5	426	June 7, 2024
Categorization + Entity Extraction + Normalization Prompting	13	1767	September 20, 2024
How to further improve Product Categorization Task? Prompting chatgpt	4	1206	June 11, 2024
Correctly categorizing products into an existing tree of categories Prompting gpt-4	2	86	February 11, 2025

Prompt to classify merchants based on a list of available categories

Related topics