Which loss function is used on Whisper model?

I read the article about Whisper model:

Robust Speech Recognition via Large-Scale Weak Supervision

They didn’t write which loss function did they used ?

It seem that they trained the model as classification task, so did they used cross-entropy loss ?