Hi Daniel, I am facing the same issue. It’s working on simulator but not on iPhone where audio format is m4a. Please share if you fixed it.
Hi, I am also facing the same issue by using record library for flutter in iOS.
Here is my code for recording
await record.start(
path: path,
encoder: AudioEncoder.aacLc, // by default
bitRate: 128000, // by default
samplingRate: 44100, // by default
);
After that, I upload the file to Google drive to trigger a pipeline which invoke WhisperApi
final DriveApi driveApi = await _getDriveApi();
File fileToUpload = File();
fileToUpload.parents = [_googleDriveFolderId];
fileToUpload.name = '${file.getFileName()}.m4a';
fileToUpload.mimeType = 'audio/mp4';
try {
await driveApi.files.create(
fileToUpload,
uploadMedia: Media(file.openRead(), file.lengthSync()),
);
} catch (_) {
print('Fail to upload to google drive');
}
Then I get the same invalid file format issue. Can anyone please give some advice? Thank you
Dear All,
To overcome the iOS audio issue, I utilized the “mic-recorder-to-mp3” npm package. This package not only allows for seamless audio recording but also ensures compatibility across various platforms, including iOS.
As this record your microphone audio input and get an audio/mp3
file in the end so there will be no issue related to any file format.
Once you get the blog url, convert that in base64 by using below code:
useEffect(() => {
if (blobURL != null) {
fetch(blobURL)
.then(response => response.arrayBuffer())
.then(arrayBuffer => {
const base64Data = btoa(new Uint8Array(arrayBuffer).reduce((data, byte) => data + String.fromCharCode(byte), ''));
const rawData = `base64,${base64Data}`;
fetchData(rawData);
})
.catch(error => console.error(error));
}
}, [blobURL]);
And then use your transcribe API to return then text from using above base64.
And there will no invalid file format issue in iOS
Thanks, the polyfill suggestion fixed my issue with using MediaRecorder audio in Vite.
Hi all, stumbled upon this in my googling and believe I was having the same Safari problem. I would get one word or empty responses from the api.
I can’t explain it, but when I pass in a time slice parameter to mediaRecorder.start(), then it Safari seems to produce files that work. This is with mimeType: ‘audio/mp4;’
mediaRecorder.start(1000);
I just figured this out, have not tested extensively. Definitely would be curious if anyone can explain lol.
I just wanted to point out that this is still an issue. I see it is marked with a solution but it is still a problem.
I am using this code, and it only fails to work on Safari on iPhones. I always get Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
const audioBlob = new Blob(this.recordedChunks, { type: "audio/mp3" });
const formData = new FormData();
const file = new File([audioBlob], "audio.mp3", { type: "audio/mp3" });
formData.append("file", file, "audio.mp3");
formData.append("model", "whisper-1");
axios
.post("https://api.openai.com/v1/audio/transcriptions", formData, {
headers: {
Authorization: `Bearer ${import.meta.env.VITE_OPENAI_API_KEY}`,
"Content-Type": "multipart/form-data"
}
})
.then((response) => {
const userResponse = response?.data?.text;
It also doesn’t work if I change these lines:
const file = new File([audioBlob], "audio.mp3", { type: "audio/mp3" });
formData.append("file", file, "audio.mp3");
to
formData.append("file", audioBlob, "audio.mp3");
check my post above. curious if it works for you too.
Hey, super old post, but I’m getting desperate while struggling with this exact same problem right now.
I can’t figure out how to get the [audio-recorder-polyfill] code included in my sveltekit vite project.
I’ve tried the ‘quick hack’ suggesting of including
import AudioRecorder from 'https://cdn.jsdelivr.net/npm/audio-recorder-polyfill/index.js'
window.MediaRecorder = AudioRecorder
I’ve also tried downloading that file and putting it and the associated code in my /lib folder, but no go.
I’m beginning to think it’s something very specific to my Sveltekit preprocessor, but I don’t have enough expertise to figure out what’s causing the issue. The error message is extremely unhelpful.
My code in my front end looks like
onMount(() =>{
import MediaRecorder from '../../lib/audio-recorder-polyfill//index.js'
window.MediaRecorder = MediaRecorder
})
And the error is popping up as
[plugin:vite-plugin-svelte] Error while preprocessing /home/jared/nomnomnom/nomnomnom/src/routes/home/+page.svelte - Transform failed with 1 error:
/home/jared/nomnomnom/nomnomnom/src/routes/home/+page.svelte:12:9: ERROR: Unexpected "MediaRecorder"
/home/jared/nomnomnom/nomnomnom/src/routes/home/+page.svelte
Unexpected "MediaRecorder"
10 |
11 | onMount(() =>{
12 | import MediaRecorder from '../../lib/audio-recorder-polyfill//index.js'
| ^
13 | window.MediaRecorder = MediaRecorder
14 | })
Click outside
I have a project that was working perfectly in desktop browser and it’s now falling apart on mobile because of the mp4 → whisper problem.
Replying to my own post in case this helps anyone. After hours of poking at this I finally got it. In SvelteKit you either need to disable SSR for your page, or you can load this module in an onMount
block
onMount(async () =>{
const {default:AudioRecorder} = await import('audio-recorder-polyfill')
window.MediaRecorder = AudioRecorder
})
I’m finally able to submit microphone recordings for transcription from IOS!
mediaRecorder.start(1000);
This fixed it for me, after weeks of trying other fixes! Thanks so much.
Even with the previous solutions mentioned in this thread, none of them worked for my environment in nextjs using the App Router. But now finally after some trial and error I figured it out and thought I’d post my solution here as well for others who still struggle.
Using the vanilla safari MediaRecorder
api worked to record audio/mp4
blobs, but sending them to the whisper API always gave me transcripts like Hello
or Thank You
or Bye.
, no matter what the content of the recording was. That’s even after @michellep posted that the backend was updated.
Using mediaRecorder.start(1000)
didn’t work for me, it would just upload the recorded blob after each second of recording. This aligns with the Mozilla docs. It’s actually a mystery to me how other people made it work with that setting.
I also tried recordRTC.js
and got that working eventually using audio/wav
but wasn’t satisfied with this solution as the blobs are way larger than with audio/webm
or audio/mpeg
.
Soltuion that worked for me
Other solutions above mentioned audio-recorder-polyfil
, which is hard to use in a nextjs app router environment due to the server side rendering by default. Even use client
wouldn’t do the trick as it usually does. But now finally I found what I had to do to make it work:
In the parent component that needs the recording button, I’m importing the recording button component like so:
# MyComponent.tsx
import dynamic from "next/dynamic"
import React from "react"
export default function MyComponent() {
const RecordingButton = dynamic(() => import("./RecordingButton"), { ssr: false })
return (
<div>
// other stuff
<RecordingButton />
</div>
)
}
And then inside the RecordingButton
component I’m only importing the polyfil if audio/webm
isn’t supported by the browser:
# RecordingButton.tsx
const supportsWebm = typeof MediaRecorder !== "undefined" && MediaRecorder.isTypeSupported("audio/webm")
if (!supportsWebm) {
// Dynamically import the polyfill if 'audio/webm' is not supported
Promise.all([import("audio-recorder-polyfill"), import("audio-recorder-polyfill/mpeg-encoder")])
.then(([AudioRecorderModule, mpegEncoderModule]) => {
const AudioRecorder = AudioRecorderModule.default
const mpegEncoder = mpegEncoderModule.default
AudioRecorder.encoder = mpegEncoder
AudioRecorder.prototype.mimeType = "audio/mpeg"
window.MediaRecorder = AudioRecorder
})
.catch((error) => {
console.error("Error importing polyfill:", error)
})
}
After that I was able to just use the regular browser MediaStream Recording API (you can just ask ChatGPT how to use that from here on).
I like this solution best, because I still get to use compressed formats and don’t have to use .wav and also I can just use the regular MediaStream API.
Ps.: Unfortunately had to exclude all links to docs and libraries. Would be nice if links would be enabled to make higher quality posts.
@jonnylangefeld 's solution initially worked for me, thanks for that.
However, it has a bug when in a progressive web app (PWA) context on IOS Safari. It initially works, but when putting the app in the background and back in the foreground it no longer works (despite reinitialising anything that could potentially be reinitialised). The recording blob is empty. I would not be surprised if there were similar issues in a webview as well thought that’s just a guess.
However, the solution detailed by @neet.kes , aka converting it to an mp3 on the device directly, using mic-recorder-to-mp3 for instance works.
To be clear I think that’s an issue with the polyfill potentially, not with your specfic implementation.
Hi, I’m adding myself to the list of people suffering from this problem (as of November 27th, 2023).
Like xacto pointed out, this is still an issue even though the thread is marked as resolved. cc @michellep
I’ve gone through this thread and others and observed the same behaviour described by 0x41mmar (the .mp3 file encoded on safari -iOS and macOS- always returns an error Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
even though the audio file runs fine on my machine. Also tried with .webm, same issue).
My setup:
- a Next.js app (v14) built from the vercel/ai/next-openai boilerplate from Vercel.
- using
MediaRecorder
client-side to record audio - and
openai.audio.transcriptions.create()
to send the audio to OpenAI server-side, edge runtime.
I don’t want to invest in a solution using ffmpeg as I’m just making a prototype at the moment.
I tried the solution proposed by @jonnylangefeld, but somehow the audio-recorder-polyfill
just returns a silent file for me (the length is correct, but input sound is not encoded). I still need to try a couple things that have been mentioned here.
Overall I find this very frustrating because everyone in this thread seems to be using various workarounds to the same problem, which sounds like it could be resolved in a more straightforward way on OpenAI’s end.
I am also facing this issue today on Dec 1, 2023. I’m using React (once this is working I’m ready to deploy it I’ll use a separate js backend as recommended for security purposes). Here’s my code.
Setting up the stream:
navigator.mediaDevices.getUserMedia({ video: true, audio: true })
.then((currentStream) => {
setStream(currentStream);
// show the webcam/microphone info in the video for the current user
if (myVideo.current) {
myVideo.current.srcObject = currentStream;
}
// use MediaStream Recording API
const mediaRecorder = new MediaRecorder(currentStream)
// make data available event fire every one second
setMediaRecorder(mediaRecorder);
mediaRecorder.start(20000);
mediaRecorder.ondataavailable = handleDataAvailable;
mediaRecorder.onstop = handleStop;
})
.catch((err) => {
alert(`The following error occurred: ${err}`);
});
Handle data available:
const handleDataAvailable = (event: BlobEvent) => {
console.log("#handleDataAvailable");
// get the Blob from the event
const eventBlob = event.data
if (eventBlob && eventBlob.size > 0) {
const totalSpeechToTextChunksRecorded = speechToTextChunks.concat(eventBlob);
setSpeechToTextChunks(totalSpeechToTextChunksRecorded);
// Send to Whisper API
const recordingFileName = `${recordingFilePrefix}-${totalSpeechToTextChunksRecorded.length}.mp4`;
console.log('=========================')
console.log(`recordingFileName: ${recordingFileName}`)
//var audioFile = new File([eventBlob], recordingFileName, {type: "video/mp4"});
api.transform.speechToText(eventBlob, recordingFileName)
.then(transcription => {
console.log(transcription.text)
})
}
}
I always get Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
. Even though I set the mime type and append the .mp4
file extension on the filename as suggested.
I guess I’m forced to use a Python backend which will introduce more latency for a streamed transcription.
This is tough guys - please fix or help.
+1 on the above. This is 100% still an issue, doesn’t work for me on Nextjs for Safari.
+1. My tried and tested solution of ‘do nothing and wait till fixed’ isn’t working out so far.
I am having the same issue, however, I am recording audio from an Android device and trying to have is transcribed. I get the same error message everyone else here is seeing.
This topic has diverged so much over such a long period of time that to even understand what is the problem(s) can take several minutes of reading and even then the specific details for each user may not be the same.
Closing this topic so that new topics can be created.
If you have this problem still please open a new topic and give specific details.