I’m currently immersed in the exciting world of fine-tuning language models (LLMs) and I’m seeking guidance on the most effective strategies for evaluating, monitoring, and retraining them. As I delve deeper into this process, I’ve encountered various challenges and uncertainties, prompting me to reach out to this knowledgeable community for insights and advice.
I have fine-tuned the pre-trained LLM (GPT 3.5 turbo-1106) which contains multiple AI skills like text to sql conversion, text to xml conversion and product information. My question is about the best practice to create and manage the dataset for these multiple skills using single LLM model. After fine tuning generally we evaluate and examine the model output. My question is around the re-tuning the model again for the mistakes it is doing for the previous dataset. For further fine tuning, do we need to keep the dataset we used for previous fine tuning and append the new dataset or as the model is already trained with the previous dataset and we just need to keep the data for the mistakes it is doing?
Moreover, how frequently should a model be retrained to maintain its relevance and accuracy? Are there specific triggers or indicators that signal the need for retraining, such as changes in data distribution or task requirements?