The time from sending the request to receiving the 524 response is about 100 seconds. Considering that the Whisper API supports audio files up to 25 MB, 100 seconds might not be enough to process such a large file.
It truly is random. Some long audio files will not trigger the cf_gateway_timeout, while other shorter ones will.
Even splitting audio files down to shorter length seems to have little effect.