Which loss function is used on Whisper model?

I read the article about Whisper model:

Robust Speech Recognition via Large-Scale Weak Supervision

They didn’t write which loss function did they used ?

It seem that they trained the model as classification task, so did they used cross-entropy loss ?

1 Like

Hi!

I am writing my Master’s thesis about a Whisper related topic and need to discuss the loss functions used for training … I suspect it really is cross-entropy loss, but have you found some proof (other than forum blogs) by any chance?

Thanks!