Hi, I am experimenting with training/fine-tuning gpt-3 specifically for my niche.
My aim is really to have a model specifically trained for my niche, instead of using few-shots.
I have lots of unstructured text in the form of reports, ebooks, etc (tens of thousands of pages)
I want to convert those unstructured text data to a structured dataset so that I can use it to fine-tune gpt-3 for my niche.
And then I will use this niche-fine-tuned model for various tasks like question answering, completion, classification, etc.
There are several proposed methods like using NER detection, BERT classification, and some others.
Manual annotation/labeling on those documents with thousands of pages, obviously, would last forever and cost a lot.
I would appreciate some expert direction here as I could not find any best practices on the net.
Thank you.