Yeah @edmund, I saw the same weird TL curve when fine-tuning a binary classifier on this thread over here:
My training file had 4000 examples, and the system decided to choose 3 epochs for this amount of data. So with only 3 epochs, I don’t feel I was overfitting, and all examples were totally different tokens going in (no repeats).
When I get some time, I was going to monitor this model with the old Babbage, and see if there are any discrepancies, or degradation in model performance (since the old model was 4 epochs, and used the same training data).
But initial spot-checks show the new “overfit” model is performing correctly. Just need more data to be confident.
But the TL curve going to 0 is disturbing!