Help with fine-tuning, think I'm over-fitting, but not sure

Yeah @edmund, I saw the same weird TL curve when fine-tuning a binary classifier on this thread over here:

My training file had 4000 examples, and the system decided to choose 3 epochs for this amount of data. So with only 3 epochs, I don’t feel I was overfitting, and all examples were totally different tokens going in (no repeats).

When I get some time, I was going to monitor this model with the old Babbage, and see if there are any discrepancies, or degradation in model performance (since the old model was 4 epochs, and used the same training data).

But initial spot-checks show the new “overfit” model is performing correctly. Just need more data to be confident.

But the TL curve going to 0 is disturbing!