First: I would see if it is the file size, or the audio length.
For transcriptions, you can send Opus audio using a voice codec. This is three hours at under 20MB:
ffmpeg -i audio.mp3 -vn -map_metadata -1 -ac 1 -c:a libopus -b:a 12k -application voip audio.opus
It’s more efficient for everybody, and limiting to voice bandwidth can improve the transcription.
Then: is it terminating at a silence? Too much silence would normally get you some hallucinations after a long period, not a premature finish, but the behavior may have changed.