Hi, I’ve been using the Huggingface library to fine-tune the Whisper model. While the WER was initially decreasing, I’ve noticed it began to rise even though the validation loss continues to drop. Could the issue be related to my testing on a very small dataset?
As shown in the image, after 80th step the wer suddenly started increasing from 13 → 28
This looks like classic overfitting, the model is beginning to learn the dataset and not the underlying structure.
You can try with more variations on your source data, noise, clicks, pops, other likely audio interference, slowing speech down, speeding it up etc, to create more “synthetic” data or obtain more raw training data.
The validation loss, if you have a validation set with recordings and transcripts also, is continuing to improve even at the end.
If the validation is truly representative of the types and variety of audio input the model will accept, and is interchangeable in quality with the training data, it would seem you can continue training more if you do not care about the world languages and other training of the external WER audio dataset.