Prompting GPT3.5 for NER data labeling

WolfgangB · January 24, 2024, 8:09pm

Hello,

I’m trying to get ChatGPT to label text data. My text data consists of incomplete sentences and tons of comma deliminated phrases. Basically think of text as lists or short ideas. I’m having trouble getting either consistent answers or the data returning in a clean json format.

Here is an example of one such concept:

Jim Carrey, the mask, child, yellow suit, in a theater.

I want gpt to label the text as following

{“people”: “Jim Carrey”,
“fictional characters”: “the mask”,
“Age terms”: “child”,
“other”: “Yellow”, “suit”, “in”, “a”, “theater”}

Any advice would be really helpful - thank so much

WolfgangB · January 24, 2024, 8:16pm

my current template is this:

template = f"“”

You are a data labeler labeling data to be used in token/Named Entity Recognition.

Create a JSON response using GPT-3.5 that categorizes words from the provided text into specific keys.

The keys should include People, Fictional Characters, Age-related Terms, and Other.

The values of these keys should be lists containing relevant words from the input text.

Note: Age related terms should be related to the age of people or animals, specifically.

Age should not include gender or sex terms

Example:

input text: Jim Carrey, the mask, child, yellow suit, in a theater.

Output:
“people”: “Jim Carrey”,
“fictional characters”: “the mask”,
“Age terms”: “child”,
“other”: “Yellow”, “suit”, “in”, “a”, “theater”

“{input}”

“”"

The problem I get with this template is I get inconsistent strings in some categories. For instance I get a lot of gender words in the age category and I get things like hair or clothing terms in the fictional character/people category

jr.2509 · January 24, 2024, 8:17pm

Hi Wolfgang - Welcome to the Forum.

Have you considered creating a fine-tuned GPT 3.5 model? This use case seems like a good fit for one.

WolfgangB · January 24, 2024, 8:17pm

yeah - its a good idea… any experience with it? How many examples should I consider?

jr.2509 · January 24, 2024, 8:19pm

Done quite a few of these with really positive experiences.

Data set size depends a bit on the diversity of categories. My initial recommendation would be to start with a few hundreds and see where it takes you. Then make a strategic decision around whether and how to expand and refine your training set.

Are your labels open ended or do you have a complete list of possible labels to choose from? The latter obviously helps to steer a finetuned model a lot.

WolfgangB · January 24, 2024, 8:24pm

I’m using regex rn to find people and age terms… so I have some examples of those…

My main concern with finetuning is I have quite a bit of NSFW terms in my main data corpus… I could train on a few hundred s(er)fw examples that have people and age terms… and hope that the model I train for NER after recognizes the NSFW terms as other…

also - any advice on getting outputs as json in 3.5? It works well in 4 with the current template but not with 3.5-turbo-instruct

anon22939549 · January 24, 2024, 8:30pm

You might be interested in this,

jr.2509 · January 24, 2024, 8:31pm

Ok. If you don’t have a closed-ended list of terms that is fine, too. The model should still pick up the overall pattern.

You should include examples of NSFW terms in your training set for the model to understand how to treat these.

In terms of JSON, yes you can instruct the model via fine-tuning to respond in a desired JSON format. Again here, tried and tested and works very well. I agree that in a non-finetuned setting, GPT-4 is inherently better at this but you can definitely get consistent JSON results with a finetuned GPT 3.5.

Finally, ensure your system prompt is specific. If you are for instance worried about the volume of words for a given category, then simply include restrictions in your system prompt in this regard (i.e. no more than X).

WolfgangB · January 24, 2024, 8:39pm

Thanks @anon22939549 - thats more or less what I’m trying to augment. I hope to benefit from finetuning a NER/Token Classifier downstream using this GPTlabeled dataset. So for that, I need a few thousand examples. It looks like my current workflow is something like

Use GPT4 to label a few 100 examples with my specific token/labels
Finetune GPT3.5-instruct
Use Finetuned GPT3.5 to label some 100 examples, check if successful go to 4, otherwise repeat 1-3
Use Finetuned GPT3.5 to label some thousands of examples (with min 1000 examples in people, age, ect categories)
Finetune Roberta/Distilbert originally trained on CoNLL for my purposes
Celebrate

WolfgangB · January 24, 2024, 8:40pm

@jr.2509 thanks for the suggestions so far.

I’m dubious about finetuning on NSFW text as I’ve already been flagged and denied in other examples… when trying to build a nsfw text classifier. Have you had success in finetuning on NSFW text?

jr.2509 · January 24, 2024, 8:45pm

Unfortunately my data did not involve NSFW terms.

That said, maybe you can just include instructions in your system prompt in this regard, i.e. labeling any NFSW terms as “Other”. It’s a bit trial and error but I could see it working out in practice.

jr.2509 · January 24, 2024, 8:48pm

one more pointer re JSON format. In your system prompt, include the generic JSON schema that you want the model to respond in addition to including the specific JSONs as example assistant outputs.

anon22939549 · January 24, 2024, 8:59pm

Here’s the issue I see with this, 100 examples is almost certainly far too few to get very good results.

How many labels do you have?

WolfgangB · January 24, 2024, 9:02pm

Just to clarify, the 100 examples are to check if I like how the finetuned gpt3.5 labeled the data for me.

I want it to create 5 or so classes for token classification. If It did the trick, I can try to label a few 1000 and again randomly sample 100 to spot check BEFORE finetuning the NER transformer…

if the finetuned gpt needs more help I can add more examples and refinetune…

But to eventually train the NER transformer I’m aiming for around 1000 examples for each class (Which can occur in as little as 1000 training examples but will more likely be 3k-6k examples)

Does that make sense?

anon22939549 · January 24, 2024, 9:11pm

Yeah, that makes more sense, I must have misunderstood your plan.

One reason I sent the link I did, is that you may have some luck finding and modifying one or more of the pre-labeled datasets to use your labels which would be a quick (and cheap) way to get a relatively large dataset to train from.

WolfgangB · January 24, 2024, 9:27pm

Ahhhh you know I haven’t really considered respec-ing an old dataset… I’ll definitely look into it.

Thanks for the conversation~

curtis.pokrant · January 24, 2024, 10:43pm

Curious to know ballpark costs for the fine tuning!

umesh · January 25, 2024, 12:24am

great workflow. sounds like you have done this. how successful was the project? whats the accuracy like?

bijour · January 25, 2024, 1:50pm

I’m actually also looking at solution for NER data labelling right now, and I was thinking about implementing a fine tuned version of the BERT NER model, do you think this could also be a good option ?

context: I want to extract around 10 pairs of entity-value in one particular type of document.

Topic		Replies	Views
How to improve a fine-tune classifier? Prompting	10	1427	August 15, 2022
Having trouble to make AI avoid certain topics Prompting	13	3504	April 17, 2022
Use "private" dataset as basis for AI responses Prompting	29	2877	December 16, 2023
Looking for help with prompt optimization! Prompting	12	1200	May 10, 2022
Train back and forth dialogues Prompting	14	1906	December 17, 2023

Prompting GPT3.5 for NER data labeling

Related topics