Where do you get so many content from? For example for my paraphrasing model I had to google and get all paraphrase examples and it was about 100 examples. Is there better way of doing it?
Thanks
Where do you get so many content from? For example for my paraphrasing model I had to google and get all paraphrase examples and it was about 100 examples. Is there better way of doing it?
Thanks
Hi @davut,
Can you please elaborate a little bit?
Are you wanting to put your data into .jsonl format?
Search Kaggle Datasets and Google Datasets first.
Yeah I’ve had a lot of luck with kaggle, wikipedia data. You can try hitting up common crawl but that data is harder to data mine.
No, I just want to find a dataset or create easily. For example let’s say I want to make a fine tuned model for instagram captions, I would probably spent hours to try to scrape examples.
Thank you so much, so many datasets
One way could be to create another model which will be good at generating .jsonl
files.
Generated jsonl
files than can be used to fine-tune/learn new models for your task.