linus
72
Hi @pudepiedj,
very glad to hear that everything worked out! Thank you also for sharing this back in the forum and for the Link you provided!
All the best!
Agree, run all the AI models locally if you aren’t running a cloud based business, and just transcribing by hand here and there.
1 Like
As indicated in my last post (because I have a private and business accounts I have been logged out on this one for days, so it may looks as if it was someone else), I managed to get tensorflow working and using the AMD Radeon Pro 5500M on my 2019 Macbook. I’d been under the impression that Whisper was built on tensorflow, but that’s not the case: it’s built on PyTorch, as I guess everyone but me probably knew already. So although it was great to get tensorflow working and accessing the GPU for the first time in 3 years, it didn’t solve the Whisper problem.
However, largely thanks to conversations with GPT-3.5-turbo and the current version of chatGPT, I now have Whisper working locally with PyTorch and using the GPU.
Running on the CPU the transcription of a very short - 1 minute - audio file took over 200 seconds; I just ran it after sorting out the upgrade to MacOS Ventura 13.3 and PyTorch1.12 - very fresh out of the stable, I think - cf. the daisy-chain that starts at Apple Developers Forum/metal/pytorch - and it ran in 15 seconds, without even needing to specify that it should use the GPU.
This may be of wider interest, so here are the steps:
conda install pytorch torchvision torchaudio -c pytorch-nightly
- Do whatever you normally do to test the installation, but here’s the recommendation from Apple or somewhere else on the daisy-chain:
import torch
if torch.backends.mps.is_available():
mps_device = torch.device("mps")
x = torch.ones([2,3,4], device=mps_device)
print (x)
else:
print ("MPS device not found.")
- Then - assuming you get a [2,3,4] tensor - you are “good to go”, and this should run on the GPU automatically. (Note the commented-out –device specification which throws a “zsh:1:number expected” error that I don’t understand. Please explain if you do!)
import time
start = time.time()
!whisper 'testing2-audio.m4a' --model tiny.en --threads 8 #--device torch.device("mps")
end = time.time()
duration = end - start
print(f"Transcription took {duration} seconds.")
Comes out at 15 seconds for a 54-second file that took forever beforehand, so I don’t think there can be any doubt that this invoked the GPU. Phew! Only took three years for Apple to honour its responsibilities as a vendor.
1 Like
Thank you for following up! I prefer PyTorch to TensorFlow as well.
I’m sure many MacOS users will find this helpful.
1 Like
Thanks Ronald, but I have a supplementary question. Maybe this should be in another thread, but since we’re all here …
I at least am confused about what “CUDA” really is. At one level it is an Nvidia language for accessing their hardware, but at another it seems to be a general-purpose GPU-programming protocol. For example, on the AMD “HIP” site there is this code, and it generates an error “PyTorch was not compiled with CUDA option” if you try to run it.
My question is: ok, but how do I compile PyTorch using the CUDA option to “run” - or at least “sit” - on an AMD GPU system, and should I even try? If I can, and so run a whole menagerie of CUDA scripts on an AMD system, that’s fantastic, but … ? Here’s the unedited exemplar material, which looks as though it should almost work, but it fails because “PyTorch wasn’t compiled with CUDA option”:
cuda = torch.device('cuda') # Default HIP device
cuda0 = torch.device('cuda:0') # 'rocm' or 'hip' are not valid, use 'cuda'
cuda2 = torch.device('cuda:2') # GPU 2 (these are 0-indexed)
x = torch.tensor([1., 2.], device=cuda0)
# x.device is device(type='cuda', index=0)
y = torch.tensor([1., 2.]).cuda()
# y.device is device(type='cuda', index=0)
with torch.cuda.device(1):
# allocates a tensor on GPU 1
a = torch.tensor([1., 2.], device=cuda)
# transfers a tensor from CPU to GPU 1
b = torch.tensor([1., 2.]).cuda()
# a.device and b.device are device(type='cuda', index=1)
# You can also use ``Tensor.to`` to transfer a tensor:
b2 = torch.tensor([1., 2.]).to(device=cuda)
# b.device and b2.device are device(type='cuda', index=1)
c = a + b
# c.device is device(type='cuda', index=1)
z = x + y
# z.device is device(type='cuda', index=0)
# even within a context, you can specify the device
# (or give a GPU index to the .cuda call)
d = torch.randn(2, device=cuda2)
e = torch.randn(2).to(cuda2)
f = torch.randn(2).cuda(cuda2)
# d.device, e.device, and f.device are all device(type='cuda', index=2)
Good question. CUDA is proprietary to Nvidia. I believe AMD has a porting solution? Hopefully someone else can bring more light. That’s the extent of what I know.
MPS or Metal Performance Shaders, is basically the equivalent of CUDA for most Macs (especially newer ones)
For my M/L workloads, I use my M1 Ultra based Mac with 48 GPU cores with Metal 3 support. It can render an image using Stable Diffusion in less than 30 seconds. So not crazy fast, but at least I am using those GPU cores.
Modern GPU’s, although, can have thousands of cores on each card. Usually we are talking Nvidia (non-Mac) cards here. Example is the Nvidia A100, which is what AI datacenters use, has 6912 CUDA cores! So way more power than my Mac!
I am not absolutely sure of this, but I suspect everything I wrote about getting the AMD Radeon Pro 5500M running under Whisper turns out not to be so. Although it is quite obviously available because it runs the little test program that generates a simple tensor (above), and identifies it as having done so, it doesn’t seem to be working with my local installation of whisper, and in fact throws up the most bizarre sequence of errors if I try, so I am not sure what is going on.
Running a 16-minute audio transcription on the CPU takes about 175seconds on tiny.en to produce a perfectly acceptable translation that isn’t in Welsh (yes, it’s still that file), and so provided I set up a loop and leave it running overnight to transcribe dozens of others, it will probably do, but does anyone know how to call the GPU? What follows clearly doesn’t work, and creates all sorts of issues from “MPS backend out of memory” through something about “Not Implemented: aten::empty.memory_format for Sparse MPS” (whatever that is), references to being unable to retrieve cached pytorch checkpoints, and all sorts of other guff that I suspect is entirely spurious.
This fails inside a Jupyter Notebook running inside PyTorch (delete the last two terms and it runs fine on the CPU and may work even better by setting –threads #n):
!whisper {file_name} --model tiny.en --device "mps"
Incidentally, this has afforded a wonderful illustration of where completion decoders fail, for all their undoubted strengths, power and utility. Because they work on the “we are where we are” principle, they’ll happily go on digging into the hole you’re already in when what they should be saying is “maybe we shouldn’t be where we are”. I’ve been down quite a few rabbit-holes pursuing this with gpt-3.5-turbo holding my hand, all to no avail. Maybe I should try GPT-4?
1 Like
It now seems clear that Whisper is not going to run on the mps backend unless and until someone extends the support for sparse tensors to MPS. I’ve done some investigation of the aten::empty.memory_format NotImplementedError and it turns out that somewhere in the background Whisper tries to save on memory by implementing sparse tensor formats, but they don’t transfer to the mps GPU, at least for now. There’s a github discussion [here] (Add aten::empty.memory_format for SparseMPS · Issue #87886 · pytorch/pytorch · GitHub).
As I’ve said in a post there, it’s easy to recreate the aten:: error by creating a sparse tensor and trying to move it to the mps device:
>>> s = torch.sparse_coo_tensor(
... torch.tensor([[1, 1, 2],
... [0, 2, 1]]),
... torch.tensor([9, 10, -3]),
... size = (3, 3))
>>> s.to_dense()
tensor([[ 0, 0, 0],
[ 9, 0, 10],
[ 0, -3, 0]])
Then
device = torch.device('mps')
s.to(device)
immediately generates the reported error.
This is no longer an OpenAI problem, but it may be of interest to others trying to use whisper on a Mac.
We are seeing this too. We have had multiple reports of transcriptions coming back in Welsh.
For those people trying to run whisper locally and struggling with GPU support on macs. I would highly recommend taking a look at whisper.cpp
1 Like
In most cases prompting works well. However today I had a file that it didn’t. I found that the audio was too quiet and by normalizing first (I used Audacity) it sorted the issue.
ChatGPT has now written me a some code to normalize (why stress!) which I have embedded into my transcription app.
I have found another cause of Whisper Welsh-ifying. In the linked clip you will hear the ‘bong’ of a Mac notification. This was enough to cause Whisper to interpret the otherwise, pretty clear recording to be Welshified. I removed the bong with Audacity
Welsh Bong
Thanks Chris, this is a very useful resource that cuts transcription time down considerably. An 18-minute audio file, once converted to 16-bit wav, transcribed using the base.en model in 57 seconds. and using tiny.en in 30 secs, which are both about 5 times faster than the CPU python version.
1 Like
Faced a similar problem using whisper.
An audio recording where English is spoken with a Turkish accent. Audio transcribed and translated (correct) into Turkish via the open api.Audio.transcribe.
the same behavior of the model was noticed on transcription of audio recordings with English words and a Russian accent in audio.
I don’t pass the language as a parameter on purpose
Also I convert audio ogg → mp3, and don’t do any preprocessing on it
I tried several promts like: an audio recording with an accent, don’t translate it. It did not give any result
maybe someone has faced a similar problem ?
Hi
I am still really struggling with some recordings going into Welsh.
I am hoping someone from the OpenAi team (or anyone else) may see this and have some advice.
Here is the simple test code I am using for testing purposes (you will need to add you own key) and I have linked to the audio file which is causing the issues.
import openai
from tkinter.filedialog import askopenfilename
import pymsgbox
api_key = your_api_key
chosen_file = askopenfilename()
openai.api_key = api_key
audio_file = open(chosen_file, "rb")
transcript = openai.Audio.transcribe(
model = "whisper-1",
file= audio_file,
prompt= "I am English, always transcribe in English",
options ={
"language" : "en",
"temperature" : "0"
}
)
raw_text = transcript.text
print(transcript.text)
pymsgbox.alert(transcript.text)
Audio file that goes to Welsh
Thanks in advance!
Hi @justin3,
There are many suggestions in this thread, which I started several months ago, and some people think the problem is solved, but I’ve basically given up on using the API for translation. Nevertheless, running the github repository code in Google Colab seems to work without issue, and I’ve had success running the API from a Jupyter Notebook, without finding Welsh in the output. My preferred method now, which works so well that I don’t look back at the others, is to use the github gerganov port to .cpp which has been posted here before.
Sorry not to be able to be more helpful, but if you look back at this long thread you will see many suggestions. It isn’t clear how many of them you have tried.
1 Like
YOU ARE THE GOAT my friend, thanks helped me completely
Summary created by AI.
User pudepiedj ran transcriptions of his podcast, Unmaking Sense , via Python3 using OpenAI’s Whisper. Unexpectedly, one episode was translated into something resembling Welsh. Users discussed ways to ensure English transcription, such as specifying language in API settings. Some users noted that the model could misunderstand certain accents as different languages. Despite specifying ‘en’ for English in the API call, pudepiedj’s audio was still transcribed into “Welsh”. He suggests an issue with the transcription API not properly recognizing the specified language. Users also suggested the use of prompts. Pudepiedj managed to get transcripts working locally on his Mac using PyTorch and his AMD Radeon Pro 5500M GPU. User iamflimflam1 recommended using whisper.cpp for those experiencing difficulties with local execution and GPU support on Macs. ref
I am still occasionally getting reports of translations into Welsh or problems with the MPS back end.
After a lot of thrashing about I am convinced that the best available solution, at least on Apple, is to use Georgi Gerganov’s port of whisper to C/C++ at whisper.cpp. I have now implemented the CoreML version on Apple Silicon M2 and a 15-minute file transcribes in 10 seconds using base-en. And it’s in English!