Not certain if it’s the case with you, but m4a files are notorious for having moov atoms at the end of the file (instead of beginning). Please verify whether that’s the case with your file. If you just want to somehow make it work without getting into the moov atom locating rabbit hole, then simply try:
ffmpeg -i input.m4a output.mp3 # or, even better, output.wav
…and then use the MP3 (or WAV) file instead of the M4A (MP3, among many other formats, is guaranteed to have moov atom at the start of the file). So if that was the reason for the issue, it should be fixed.
My experience with Whisper via the OpenAI API, is to send the full byte object of the audio file. Not sure if streams work. So you have to download the file to disc or memory, and send the full bytes in the request.