it’s probably just because they use some variation of cross entropy for training, maybe with a scaling factor, no?
I can’t think of a good simple loss function that allows for negatives in this context
it’s probably just because they use some variation of cross entropy for training, maybe with a scaling factor, no?
I can’t think of a good simple loss function that allows for negatives in this context