@acoloss Try the prompting techniques from myself and @RonaldGRuckus, this should fix the “Welsh” problem.
The reason why it went to “Welsh” on your accent is that it does autodetect language and output in that language. So you can let it do this, and it should work, if not, then prompt it in that language. It works for me when the input is Spanish without prompting (the second most used language in my neck of the woods).
So, if your mother in law is French, then see if it auto-dectects correctly, otherwise put:
prompt: "<SOME_TEXT_IN_FRENCH_HERE>"
This is only needed if the auto-detect isn’t working.
As for the “ta” translating to “to the”, no idea if that will work, but you are free to try, and also prompt.
prompt: "I say that I am going ta shop, and I mean I am going to the shop"
1 Like
In a prompt, can i just write “This audio will never be in Welsh”
sy1
65
I had the same problem but the fix was simple in the end! You just have to supply the optional parameter called “language” and give the string value “en” et voila(I mean, and there you go)! I code in vb.net so I can’t give an example in python unfortunately.
Dim requestContent As New MultipartFormDataContent()
Dim modelContent As New StringContent("whisper-1")
Dim languageContent As New StringContent("en")
Dim promptContent As New StringContent("Please include commas and new paragraphs where appropriate.")
Dim fileContent As New ByteArrayContent(System.IO.File.ReadAllBytes(sFileMP3))
fileContent.Headers.ContentType = MediaTypeHeaderValue.Parse("audio/mpeg")
' Add the model and file to the request
requestContent.Add(modelContent, "model")
requestContent.Add(promptContent, "prompt")
requestContent.Add(languageContent, "language")
requestContent.Add(fileContent, "file", sFileMP3)
2 Likes
This seems to have worked thanks.
I didn’t add the prompt but so far in 10 goes it’s worked. before that, it was doing it 1/2 times i’d say.
1 Like
sy1
67
Awesome! I don’t understand why it hasn’t been publicised more widely, but there you go.
1 Like
The reason why the prompting route was recommended was that folks were setting the language above and it wasn’t working. But sure, try language and if that doesn’t work, try prompting, or just do both at the same time.
Well, … sorry to be an absentee for a few days having been the one who originally raised this, and thank you to all those who have contributed to what seems to work for some at least as a solution. Unfortunately, I just took @curt.kennedy’s advice and added a long prompt with a language designation to my original audio, and it still came out in Welsh! I will take another look when I have more time.
I have, incidentally, solved the problem @linus offered some help with and can now run tensorflow on my AMD Radeon Pro 5500M, thanks to a very helpful blog here, so it is at least conceivable that I will have a local GPU-enabled version of whisper at some stage. Don’t hold your breath!
2 Likes
Wow, you did both and it didn’t work?
My only other guess is drop the language parameter and just use the prompt. And if that doesn’t work, just keep adding more and more English to the prompt. We were getting the prompt only to work (and not setting the language).
Whaaaaat.
Can you provide the audio? Or send it via DM?
I know you solved it but I’m genuinely interested. Why Welsh?
Totally recommend running Whisper locally.
1 Like
linus
72
Hi @pudepiedj,
very glad to hear that everything worked out! Thank you also for sharing this back in the forum and for the Link you provided!
All the best!
Agree, run all the AI models locally if you aren’t running a cloud based business, and just transcribing by hand here and there.
1 Like
As indicated in my last post (because I have a private and business accounts I have been logged out on this one for days, so it may looks as if it was someone else), I managed to get tensorflow working and using the AMD Radeon Pro 5500M on my 2019 Macbook. I’d been under the impression that Whisper was built on tensorflow, but that’s not the case: it’s built on PyTorch, as I guess everyone but me probably knew already. So although it was great to get tensorflow working and accessing the GPU for the first time in 3 years, it didn’t solve the Whisper problem.
However, largely thanks to conversations with GPT-3.5-turbo and the current version of chatGPT, I now have Whisper working locally with PyTorch and using the GPU.
Running on the CPU the transcription of a very short - 1 minute - audio file took over 200 seconds; I just ran it after sorting out the upgrade to MacOS Ventura 13.3 and PyTorch1.12 - very fresh out of the stable, I think - cf. the daisy-chain that starts at Apple Developers Forum/metal/pytorch - and it ran in 15 seconds, without even needing to specify that it should use the GPU.
This may be of wider interest, so here are the steps:
conda install pytorch torchvision torchaudio -c pytorch-nightly
- Do whatever you normally do to test the installation, but here’s the recommendation from Apple or somewhere else on the daisy-chain:
import torch
if torch.backends.mps.is_available():
mps_device = torch.device("mps")
x = torch.ones([2,3,4], device=mps_device)
print (x)
else:
print ("MPS device not found.")
- Then - assuming you get a [2,3,4] tensor - you are “good to go”, and this should run on the GPU automatically. (Note the commented-out –device specification which throws a “zsh:1:number expected” error that I don’t understand. Please explain if you do!)
import time
start = time.time()
!whisper 'testing2-audio.m4a' --model tiny.en --threads 8 #--device torch.device("mps")
end = time.time()
duration = end - start
print(f"Transcription took {duration} seconds.")
Comes out at 15 seconds for a 54-second file that took forever beforehand, so I don’t think there can be any doubt that this invoked the GPU. Phew! Only took three years for Apple to honour its responsibilities as a vendor.
1 Like
Thank you for following up! I prefer PyTorch to TensorFlow as well.
I’m sure many MacOS users will find this helpful.
1 Like
Thanks Ronald, but I have a supplementary question. Maybe this should be in another thread, but since we’re all here …
I at least am confused about what “CUDA” really is. At one level it is an Nvidia language for accessing their hardware, but at another it seems to be a general-purpose GPU-programming protocol. For example, on the AMD “HIP” site there is this code, and it generates an error “PyTorch was not compiled with CUDA option” if you try to run it.
My question is: ok, but how do I compile PyTorch using the CUDA option to “run” - or at least “sit” - on an AMD GPU system, and should I even try? If I can, and so run a whole menagerie of CUDA scripts on an AMD system, that’s fantastic, but … ? Here’s the unedited exemplar material, which looks as though it should almost work, but it fails because “PyTorch wasn’t compiled with CUDA option”:
cuda = torch.device('cuda') # Default HIP device
cuda0 = torch.device('cuda:0') # 'rocm' or 'hip' are not valid, use 'cuda'
cuda2 = torch.device('cuda:2') # GPU 2 (these are 0-indexed)
x = torch.tensor([1., 2.], device=cuda0)
# x.device is device(type='cuda', index=0)
y = torch.tensor([1., 2.]).cuda()
# y.device is device(type='cuda', index=0)
with torch.cuda.device(1):
# allocates a tensor on GPU 1
a = torch.tensor([1., 2.], device=cuda)
# transfers a tensor from CPU to GPU 1
b = torch.tensor([1., 2.]).cuda()
# a.device and b.device are device(type='cuda', index=1)
# You can also use ``Tensor.to`` to transfer a tensor:
b2 = torch.tensor([1., 2.]).to(device=cuda)
# b.device and b2.device are device(type='cuda', index=1)
c = a + b
# c.device is device(type='cuda', index=1)
z = x + y
# z.device is device(type='cuda', index=0)
# even within a context, you can specify the device
# (or give a GPU index to the .cuda call)
d = torch.randn(2, device=cuda2)
e = torch.randn(2).to(cuda2)
f = torch.randn(2).cuda(cuda2)
# d.device, e.device, and f.device are all device(type='cuda', index=2)
Good question. CUDA is proprietary to Nvidia. I believe AMD has a porting solution? Hopefully someone else can bring more light. That’s the extent of what I know.
MPS or Metal Performance Shaders, is basically the equivalent of CUDA for most Macs (especially newer ones)
For my M/L workloads, I use my M1 Ultra based Mac with 48 GPU cores with Metal 3 support. It can render an image using Stable Diffusion in less than 30 seconds. So not crazy fast, but at least I am using those GPU cores.
Modern GPU’s, although, can have thousands of cores on each card. Usually we are talking Nvidia (non-Mac) cards here. Example is the Nvidia A100, which is what AI datacenters use, has 6912 CUDA cores! So way more power than my Mac!
I am not absolutely sure of this, but I suspect everything I wrote about getting the AMD Radeon Pro 5500M running under Whisper turns out not to be so. Although it is quite obviously available because it runs the little test program that generates a simple tensor (above), and identifies it as having done so, it doesn’t seem to be working with my local installation of whisper, and in fact throws up the most bizarre sequence of errors if I try, so I am not sure what is going on.
Running a 16-minute audio transcription on the CPU takes about 175seconds on tiny.en to produce a perfectly acceptable translation that isn’t in Welsh (yes, it’s still that file), and so provided I set up a loop and leave it running overnight to transcribe dozens of others, it will probably do, but does anyone know how to call the GPU? What follows clearly doesn’t work, and creates all sorts of issues from “MPS backend out of memory” through something about “Not Implemented: aten::empty.memory_format for Sparse MPS” (whatever that is), references to being unable to retrieve cached pytorch checkpoints, and all sorts of other guff that I suspect is entirely spurious.
This fails inside a Jupyter Notebook running inside PyTorch (delete the last two terms and it runs fine on the CPU and may work even better by setting –threads #n):
!whisper {file_name} --model tiny.en --device "mps"
Incidentally, this has afforded a wonderful illustration of where completion decoders fail, for all their undoubted strengths, power and utility. Because they work on the “we are where we are” principle, they’ll happily go on digging into the hole you’re already in when what they should be saying is “maybe we shouldn’t be where we are”. I’ve been down quite a few rabbit-holes pursuing this with gpt-3.5-turbo holding my hand, all to no avail. Maybe I should try GPT-4?
1 Like
It now seems clear that Whisper is not going to run on the mps backend unless and until someone extends the support for sparse tensors to MPS. I’ve done some investigation of the aten::empty.memory_format NotImplementedError and it turns out that somewhere in the background Whisper tries to save on memory by implementing sparse tensor formats, but they don’t transfer to the mps GPU, at least for now. There’s a github discussion [here] (Add aten::empty.memory_format for SparseMPS · Issue #87886 · pytorch/pytorch · GitHub).
As I’ve said in a post there, it’s easy to recreate the aten:: error by creating a sparse tensor and trying to move it to the mps device:
>>> s = torch.sparse_coo_tensor(
... torch.tensor([[1, 1, 2],
... [0, 2, 1]]),
... torch.tensor([9, 10, -3]),
... size = (3, 3))
>>> s.to_dense()
tensor([[ 0, 0, 0],
[ 9, 0, 10],
[ 0, -3, 0]])
Then
device = torch.device('mps')
s.to(device)
immediately generates the reported error.
This is no longer an OpenAI problem, but it may be of interest to others trying to use whisper on a Mac.
We are seeing this too. We have had multiple reports of transcriptions coming back in Welsh.
For those people trying to run whisper locally and struggling with GPU support on macs. I would highly recommend taking a look at whisper.cpp
1 Like