Hi, I am recording audio on the browser using MediaRecorder and sending the file to openai whisper api for transcription and for some reason it would only pick up one word and other times just a bunch of random characters, when I am using an iPhone but works well on Android and on my computer
I am having the exact same issue: it works on chrome/safari on web, on android it works, but using ios I just get strange results, a simple audio file using ātest test testā will result in a jumble of chinese characters.
I too am facing this exact issue on iphone. I was also able to reproduce this on safari on macbook. Since on iphone, safari and chrome are essentially running on the same engine, I think this is a Safari related issue.
Were you able to find any workarounds/ solutions?
Hi!
Below this topic is a bunch of āRelatedā topics.
This has been a recurring issue and I suggest you work through those first.
Note: not the āSuggestedā topics.
Hey, I am encountering the same issue on ios. Has anyone here been able to resolve this?
Iām also experiencing this same issue. Has anything worked for you? Iāve tried multiple formats for encoding but nothing works on chrome or safari on iphone.
I had the same issue.
Be sure to set the timeslice option when starting, like: componentMediaRecorder.start(1000);
This resolved the problem for me.
componentMediaRecorder.start(1000);
does not work for me.
the file is recorded in webm
format. And it works firefox/chrome on desktop and android.
But on iphone, it returns 400 invalid format
error.
Is there any solution yet? I get totally weird transcriptions on the iphone. Always something like āLee Deok-Young from MBC News speakingā or something like that. This only happens on the iPhone.
Had all these issues as well. Looks like Whisper works best with a specific kind of file format. Mono channel, sample rate of 16khz, and pcm_s16le encoding. You can use ffmpeg to convert your audio with these settings or also use the web audio api. This is what worked for me:
async function convertAudioToMono(file: File | Blob): Promise<Blob> {
const audioContext = new AudioContext({ sampleRate: 16000 });
const arrayBuffer = await file.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
// Create offline context for processing
const offlineContext = new OfflineAudioContext(1, audioBuffer.length, 16000);
const source = offlineContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(offlineContext.destination);
source.start();
// Render audio
const renderedBuffer = await offlineContext.startRendering();
// Convert to WAV format
const length = renderedBuffer.length * 2;
const buffer = new ArrayBuffer(44 + length);
const view = new DataView(buffer);
// WAV header
const writeString = (view: DataView, offset: number, string: string) => {
for (let i = 0; i < string.length; i++) {
view.setUint8(offset + i, string.charCodeAt(i));
}
};
writeString(view, 0, "RIFF");
view.setUint32(4, 36 + length, true);
writeString(view, 8, "WAVE");
writeString(view, 12, "fmt ");
view.setUint32(16, 16, true);
view.setUint16(20, 1, true);
view.setUint16(22, 1, true);
view.setUint32(24, 16000, true);
view.setUint32(28, 32000, true);
view.setUint16(32, 2, true);
view.setUint16(34, 16, true);
writeString(view, 36, "data");
view.setUint32(40, length, true);
// Write audio data
const data = new Float32Array(renderedBuffer.getChannelData(0));
let offset = 44;
for (let i = 0; i < data.length; i++) {
const sample = Math.max(-1, Math.min(1, data[i]));
view.setInt16(offset, sample < 0 ? sample * 0x8000 : sample * 0x7fff, true);
offset += 2;
}
return new Blob([buffer], { type: "audio/wav" });
}