Categorization + Entity Extraction + Normalization

oliver · October 31, 2022, 7:16am

Hey y’all,

I’m trying to classify a prediction and extract some entities from it from a niche domain. First tests on classification and extraction in two different steps have inspired much confidence, but I’m looking for the best way to do it:

For example, lets assume i have a number of generic predictions about drinks:

“Tee is always way too expensive”
“I think black gold costs more than 5 dollar in starbucks”
“Orance juice tastes ugly at mc donalds”

I wanna categorize these sentences within a predefined set of drinks {tea,coffee,orange,apple,water,…} … importantly the categories cannot be extracted via entities as they might be implicit, the example above is fictional.
And extract a set of optional entities, eg “restaurant”, “price”, “taste”.
Normalize these entities so “mc donalds”, “mcdonalds”, “mcci” all come back as “Mc Donalds”

How would I best go about this? would a pretrained model with:

{“prompt”:“Orance juice tastes ugly at mc donalds–>”, “completion”:" Category: orange\nPrice:\nTaste: ugly\nRestaurant: “Mc Donalds”}

work ?

Or would i train 3 different models to do this in 3 steps ?

And finally, how can i tell a pre-trained model to categorize within a finite set of categories (and not invent new ones)? do i have to add the list of categories to the prompt all the time?

joshbachynski · October 31, 2022, 4:04pm

i need a real example
short answer:
fine tuning barely works unless you have 10k of good perfect examples AND you are trying to do something far simpler and it is not going against the trained corpus “opinion” for example you said “tea is always too expensive” if this is not the view of the corpus, you will have maximal trouble making it remember this effectively

i’ve always had better success with prompts

jeffinbournemouth · October 31, 2022, 10:31pm

I have been working on a very similar task for a few days.
My project is to extract categories & tags from Google business reviews for each business in a list AND summarize the customer sentiment/likes/dislikes for the business.

The sole reason my client desired to do the all aspects of the task with one prompt was cost.
There are 260,000 reviews so it is obviously more economical for him if all extraction/summarization can be done with one pass through the data.

My tests so far show that it is possible to complete the task with a zero shot prompt HOWEVER, the quality of the output is superior if a separate custom prompt is used for each part of the task (2X AI cost).

It’s a trade-off, if you want lower cost you will also need to settle for lower quality, because if you ask the prompt to do several things at the same time the quality will suffer (in my experience).

oliver · November 4, 2022, 9:55am

thanks a lot for sharing, this is much appreciated.

do you mind sharing the prompt you use for the categorization? do you just input like “[review] ->” or something like “The following is a google business review and it’s category:\n[review] ->” ?

jeffinbournemouth · November 4, 2022, 2:42pm

I create a scenario:

Fitnesscentrecomparison.com lists the applicable categories, facilities, equipment, and classes for every fitness centre in Australia. Visitors to the website can view the centre’s categories, facilities, equipment, and classes.
The following google reviews are for Anytime Fitness gym, 251A Morphett St, Adelaide SA 5000, Australia, and mention all of the categories, facilities, equipment, and classes, for this specific fitness centre. Website visitors can read the profile page and decide if they want to become a member. The name of the expert is

Fitness Expert

Hi Fitness Expert, please read through the following fitness centre reviews to determine ALL of the categories, facilities, equipment, and classes for this location, so we can help prospective customers decide if they want to become a member.

Reviews:

1
2
3
4
5
6 etc etc

End of reviews. Here are all of the categories applicable and the specific facilities, equipment, and classes mentioned in the reviews:

HMAQBL · August 23, 2024, 8:44am

do you provide data (reviews) in excel sheet?

jeffinbournemouth · August 23, 2024, 10:23am

Yes, you can process the reviews from excel. but much easier to use Google sheets with a Google sheet addon(performs API calls to Open AI/Anthropic/Mistral/Groq etc) to perform the processing).

Then you can have a results column which contains your required data.

HMAQBL · August 25, 2024, 7:14am

thanks @jeffinbournemouth
I have never tried Google Sheets with ChatGPT, do you mind sharing add-on name.

jeffinbournemouth · August 25, 2024, 10:26am

I built my own, but I know there are a few others available.

Try searching “google sheets AI addon”, or “Google sheets Chatgpt addon”.

HMAQBL · August 25, 2024, 1:32pm

@jeffinbournemouth thanks I found 1 very useful on Google Workspace

HMAQBL · September 19, 2024, 1:03pm

can you guide me how can I build mine with gsheets.

jeffinbournemouth · September 19, 2024, 4:33pm

you want to build your own addon?

HMAQBL · September 20, 2024, 6:19am

yes which help me with data in googlesheets

jeffinbournemouth · September 20, 2024, 8:08am

If you do not have experience in building addons, then I would suggest you use a free sheets addon that most closely fits your specific use case.

If you cannot find a suitable addon then I recommend you hire a google sheets addon developer on Upwork(or similar) as it is a non-trivial task.

Topic		Replies	Views
Turning chatgpt API into a assistant for a (complex) website API	20	4020	December 21, 2023
Resolving ChatGPT hallucinations for text classification using IAB taxonomy Prompting gpt-4 , chatgpt	3	2240	July 23, 2023
Prompting GPT3.5 for NER data labeling Prompting gpt-4 , gpt-35-turbo , chatgpt	18	4269	January 25, 2024
Force GPT 3.5 Turbo to choose an answer from a set of predefined options API	5	426	June 7, 2024
How to further improve Product Categorization Task? Prompting chatgpt	4	1208	June 11, 2024

Categorization + Entity Extraction + Normalization

Related topics