Does it make sense to evaluate the performance of an LLM in a 'Pre-Training' Stage?

Hi there!
I’m in the ‘pre-training’ stage where I’m training a language model (it’s very tiny based on a character level).

I’m looking to get a hands on experience with the engineering side while evaluating model performance.

I looked around & found that the evaluation metrics depends on the task that model was trained on.

As per my understanding, pre-training stage is an unsupervised learning with no clear objective - rather just to make the model to get used to the language, words & structure.

Question

  • Does it make sense to evaluate the model’s performance in the pre-training stage?
  • Does it only make sense to evaluate the model’s performance that’s fine-tuned for a specific task?

Would appreciate a response & anything else you’d like to add on my understanding or better path I can take to learn the engineering side of LLMs.

Cheers!

Depends if the model is aligned at the current stage of training, alignment typically reduces performance, if you look at the signs of AGI paper from Microsoft, the GPT-4 model prior to full alignment performed at a significantly higher rate than after.

2 Likes

I think you somewhat already answered your question by writing it down.
Everything you do should have a clear objective. Otherwise how are you going to assess the progress made?
If you define the goal of the unsupervised learning step to be ‘get used to the language, words & structure’ then you want to see that this actually happened.

3 Likes