I have a semi-large forum aimed at a specific niche and I’ve been experimenting with GPT-3.5 (and GPT-4) providing the first automated answer whenever someone posts a new question. Now the answers it gives are hit & miss. Sometimes the answer is spot on, other times it’s utterly incorrect. My forum has 1.3GB of (text) data consisting of questions and answers. I don’t think GPT has been trained with data from my forum (although it does know something about it) so I was wondering if it is at all possible to train it (through embeddings?) with the data from the forum.
I just want to feed it cleaned up text versions of all the forum threads. That means it will also be trained with incorrect data (since people do give wrong answers on my forum from time to time), but mostly the data is pretty much accurate.
Is this possible?