What are some best practices for fine-tuning

What are some best practices for fine-tuning language models using custom datasets, and how can I effectively evaluate their performance post-training?