Whisper API is not able to transcribe audios created on iOS

Hi there!

I have incurred in a very frustrating issue: even though this transcription API generally works perfectly, and the downloaded audio file is always intelligible

@router.post("/transcribe/{lang}")
async def transcribe_route(
    lang: str, file: UploadFile = File(...), duration: float = Form(...)
):
    """User sends an WebM file and  Whisper API converts it to text"""

    print(f"lasted {duration} seconds")

    try:
        # Check if the file is a webm file
        print(file.content_type)
        if "audio/wav" not in file.content_type:
            raise HTTPException(status_code=400, detail="Only .webm files are supported.")

        # Read the audio file data
        recording_content = await file.read()
        recording = io.BytesIO(recording_content)
        recording.name = file.filename

        # Save the audio file locally
        save_path = f"{duration}-{file.filename}"  # Use a unique filename
        with open(save_path, "wb") as audio_file:
            audio_file.write(recording_content)

        # call the whisper API
        transcription = openai.Audio.transcribe(
            "whisper-1", recording, language=languages_mapping.get(lang, "en")
        )

        # returns the transcribed text
        return transcription["text"]

For some reason when I send an audio recorded on iOS whisper is only able to transcribe the first 1-2 seconds.
I think this may be caused by the different encoding made on iOS, but there seems to be no way of fixing it client-side.

What can I do to solve it?

Thanks in advance,
Giovanni

You actually have failing audio files logged for analysis and they are understandable but can’t be transcribed?

Here I describe a re-encoding you could do, which also has the effect of recoding in voice-over-ip audio bandwidth, so if there was something like noise shaping in high definition audio, it would be stripped.