Hello to you reading this, and thank you in advance for your time/attention.
I’ve gone through +20 or previous posts on this topic and couldn’t find a satisfactory answer, hope I’m not duplicating unnecessarily.
Want to identify marketing-relevant job titles (e.g. 70-80% probability).
Input - have few million profiles that contain in 1st column job title e.g. senior colourist and 2nd column skills e.g. ‘post production’, ‘color grading’, ‘commercials’, ‘baselight’, ‘color correction’, ‘documentaries’, ‘film’, ‘broadcast television’, ‘online editing’, ‘digital cinema’.
Have another list of marketing job titles e.g. Communications and Marketing Manager and skills e.g. copywriting, editing, event planning, social media, public speaking, web content creation, newsletters.
Tried inputting small set initially, 2k examples of marketing label, and 2k of non-marketing.
Getting high match rate on irrelevant entries e.g. ‘office manager’ = 99% match to marketing. Hairdresser = 84%.
Initially I tried job title only, and label = marketing & non-marketing. This worked ok but there were some issues (90% match on ‘traffic warden’ presumably because of the job title ‘ad trafficker’ which was labeled as ‘marketing’.
Then I concatenated job title into the skills (as only allowed 1 input) so have a lot richer text input with same 2 labels. This performs worse than the first version which I’m assuming is because the tokens are split out and treated without context to the sentence string which makes up the job titles.
How the hell do I get this to work?