Whisper API not transcribing audio files coming from an iphone

Hi, I am recording audio on the browser using MediaRecorder and sending the file to openai whisper api for transcription and for some reason it would only pick up one word and other times just a bunch of random characters, when I am using an iPhone but works well on Android and on my computer

1 Like

I am having the exact same issue: it works on chrome/safari on web, on android it works, but using ios I just get strange results, a simple audio file using ā€˜test test testā€™ will result in a jumble of chinese characters.

I too am facing this exact issue on iphone. I was also able to reproduce this on safari on macbook. Since on iphone, safari and chrome are essentially running on the same engine, I think this is a Safari related issue.
Were you able to find any workarounds/ solutions?

Hi!
Below this topic is a bunch of ā€˜Relatedā€™ topics.
This has been a recurring issue and I suggest you work through those first.

Note: not the ā€˜Suggestedā€™ topics.

1 Like

Hey, I am encountering the same issue on ios. Has anyone here been able to resolve this?

Iā€™m also experiencing this same issue. Has anything worked for you? Iā€™ve tried multiple formats for encoding but nothing works on chrome or safari on iphone.

I had the same issue.

Be sure to set the timeslice option when starting, like: componentMediaRecorder.start(1000);

This resolved the problem for me.

1 Like

componentMediaRecorder.start(1000); does not work for me.

the file is recorded in webm format. And it works firefox/chrome on desktop and android.

But on iphone, it returns 400 invalid format error.

Is there any solution yet? I get totally weird transcriptions on the iphone. Always something like ā€œLee Deok-Young from MBC News speakingā€ or something like that. This only happens on the iPhone.

Had all these issues as well. Looks like Whisper works best with a specific kind of file format. Mono channel, sample rate of 16khz, and pcm_s16le encoding. You can use ffmpeg to convert your audio with these settings or also use the web audio api. This is what worked for me:

async function convertAudioToMono(file: File | Blob): Promise<Blob> {
  const audioContext = new AudioContext({ sampleRate: 16000 });
  const arrayBuffer = await file.arrayBuffer();
  const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);

  // Create offline context for processing
  const offlineContext = new OfflineAudioContext(1, audioBuffer.length, 16000);
  const source = offlineContext.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(offlineContext.destination);
  source.start();

  // Render audio
  const renderedBuffer = await offlineContext.startRendering();

  // Convert to WAV format
  const length = renderedBuffer.length * 2;
  const buffer = new ArrayBuffer(44 + length);
  const view = new DataView(buffer);

  // WAV header
  const writeString = (view: DataView, offset: number, string: string) => {
    for (let i = 0; i < string.length; i++) {
      view.setUint8(offset + i, string.charCodeAt(i));
    }
  };

  writeString(view, 0, "RIFF");
  view.setUint32(4, 36 + length, true);
  writeString(view, 8, "WAVE");
  writeString(view, 12, "fmt ");
  view.setUint32(16, 16, true);
  view.setUint16(20, 1, true);
  view.setUint16(22, 1, true);
  view.setUint32(24, 16000, true);
  view.setUint32(28, 32000, true);
  view.setUint16(32, 2, true);
  view.setUint16(34, 16, true);
  writeString(view, 36, "data");
  view.setUint32(40, length, true);

  // Write audio data
  const data = new Float32Array(renderedBuffer.getChannelData(0));
  let offset = 44;
  for (let i = 0; i < data.length; i++) {
    const sample = Math.max(-1, Math.min(1, data[i]));
    view.setInt16(offset, sample < 0 ? sample * 0x8000 : sample * 0x7fff, true);
    offset += 2;
  }

  return new Blob([buffer], { type: "audio/wav" });
}