Maybe it is torch bug in whisper on Raspberry PI 4.
I have tried whisper on M1 Macbook Pro / VPS / Raspberry PI 4 machine.
On Macbook / VPS , whisper works fine.
But on Raspberry pi 4, it does not work.
Followings are the HW / SW spec of VPS machine.
- 8G RAM, 4 vCPU
- Debian GNU/Linux 12 (bookworm) , Python 3.11.2
Followings are the HW / SW spec of Raspberry PI 4 machine.
- 8G RAM, 4 vCPU
- Debian GNU/Linux 12 (bookworm) , Python 3.11.2
( all same with VPS )
VPS / Raspberry PI 4 has same spec, but it does not work on Raspberry pi 4 only.
My code is this.
import whisper
model = whisper.load_model("base")
result = model.transcribe( "video.mp4")
‘transcribe’ method gives ‘Segmentation fault’ always on Raspberry Pi 4.
I guess it is bug of torch, but I am not sure.
I have tried to find the reason, but I could not find it.
Are there somebody who tried whisper on Raspberry pi 4?
I’d also be interested in learning how to debug this. I am running into the same issue, except I am running in an Ubuntu dockerfile on my M1 Macbook Air.
On the Mac (host OS), no issue:
>> whisper hello_world.wav
/opt/homebrew/Cellar/openai-whisper/20231106/libexec/lib/python3.11/site-packages/whisper/ UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:00.800] Hello world.
In the Ubuntu 22.04 Docker container, the same command crashes.
# whisper hello_world.wav
/usr/local/lib/python3.10/dist-packages/whisper/ UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Segmentation fault
Is there a way to get more granular info on what’s going on?
I also encountered this segmentation fault on my Raspberry Pi 4B 8GB Linux 64-bit. However, in my case, the ‘segmentation fault’ is random. If I run transcribe again, it might not crash, but it takes so long (at least 4 times the duration for decoding a segment of the same length).
Although the ‘segmentation fault’ occurs inside a try-except block, there is no stack trace available. Got to do with some low-level C-library.