Whisper on Rapsberry pi 4 gives Segmentation fault

Maybe it is torch bug in whisper on Raspberry PI 4.

I have tried whisper on M1 Macbook Pro / VPS / Raspberry PI 4 machine.

On Macbook / VPS , whisper works fine.

But on Raspberry pi 4, it does not work.

Followings are the HW / SW spec of VPS machine.

  • 8G RAM, 4 vCPU
  • Debian GNU/Linux 12 (bookworm) , Python 3.11.2

Followings are the HW / SW spec of Raspberry PI 4 machine.

  • 8G RAM, 4 vCPU
  • Debian GNU/Linux 12 (bookworm) , Python 3.11.2
    ( all same with VPS )

VPS / Raspberry PI 4 has same spec, but it does not work on Raspberry pi 4 only.

My code is this.

import whisper
model = whisper.load_model("base") 
result = model.transcribe( "video.mp4")

‘transcribe’ method gives ‘Segmentation fault’ always on Raspberry Pi 4.

I guess it is bug of torch, but I am not sure.

I have tried to find the reason, but I could not find it.

Are there somebody who tried whisper on Raspberry pi 4?

I’d also be interested in learning how to debug this. I am running into the same issue, except I am running in an Ubuntu dockerfile on my M1 Macbook Air.

On the Mac (host OS), no issue:

>> whisper hello_world.wav 
/opt/homebrew/Cellar/openai-whisper/20231106/libexec/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:00.800]  Hello world.

In the Ubuntu 22.04 Docker container, the same command crashes.

# whisper hello_world.wav 
/usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Segmentation fault

Is there a way to get more granular info on what’s going on?

I also encountered this segmentation fault on my Raspberry Pi 4B 8GB Linux 64-bit. However, in my case, the ‘segmentation fault’ is random. If I run transcribe again, it might not crash, but it takes so long (at least 4 times the duration for decoding a segment of the same length).

Although the ‘segmentation fault’ occurs inside a try-except block, there is no stack trace available. Got to do with some low-level C-library.