I am training the API with the content of a blog post made in WordPress. It’s a site with thousands of comments and I’m not sure whether to use those comments left by users to train the bot.
On the one hand, the AI is trained with real, everyday information, but on the other hand noise can be introduced into the AI due to the way comments are usually written, which usually have poor grammar and spelling.
What decision would you make?
What do you want to achieve?
I imagine you’re talking about fine-tuning. Fine tuning doesn’t really let the model retain much “real, everyday information”. It will likely just make the generations sound more like the posts you’re feeding it.