Unrecognized File Format" Error with MemoryStream When Using OpenAI's Whisper API in C#

Hello everyone,

I’m currently working on integrating OpenAI’s Whisper API into a C# application to transcribe audio files directly from S3 storage, without writing them to disk. Despite setting the MIME type correctly and ensuring the file name is included in the multipart form-data, I consistently receive an error stating “Unrecognized file format.”

Background: The goal is to stream audio files stored in S3 directly to the Whisper API without creating a temporary file on the server. This approach should help maintain efficiency and security by avoiding disk I/O.

Code Snippet: Here is the core method I am using:

public async Task<string> SpeechToTextAsync(string audioFileName, string model = "whisper-1")
{
    var buffer = new MemoryStream();
    await _s3Service.DownloadFileToBufferAsync(audioFileName, buffer);
    buffer.Position = 0;

    var content = new MultipartFormDataContent();
    var fileContent = new StreamContent(buffer);
    fileContent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue(GetMimeType(audioFileName));

    content.Add(fileContent, "file", audioFileName);
    content.Add(new StringContent(model), "model");

    var response = await _httpClient.PostAsync(
        "https://api.openai.com/v1/audio/transcriptions",
        content);

    var result = await response.Content.ReadAsStringAsync();
    if (!response.IsSuccessStatusCode)
    {
        return $"API call failed: {result}";
    }

    var transcription = JsonConvert.DeserializeObject<dynamic>(result);
    return transcription.text.ToString();
}

private static string GetMimeType(string fileName)
{
    var extension = Path.GetExtension(fileName).ToLowerInvariant();
    return extension switch
    {
        ".mp3" => "audio/mpeg",
        ".wav" => "audio/wav",
        ".ogg" => "audio/ogg",
        // Add other cases as necessary
        _ => "application/octet-stream"
    };
}

Issue: Every attempt results in a response error: { "error": { "message": "Invalid file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']", "type": "invalid_request_error" } }.

This error occurs despite confirming that the MIME type is correctly set and that the file name is correctly passed in the Content-Disposition. I suspect the issue might be with how the MemoryStream is handled or perceived by the API.

Attempts to Resolve:

  1. Checked MIME type setting - it’s correctly mapped based on the file extension.
  2. Ensured the Content-Disposition is correctly setting the file name.
  3. Reviewed similar issues in other languages (e.g., Python implementations using io.BytesIO), which hinted at similar problems but didn’t directly translate to a solution in C#.

Questions:

  1. Has anyone faced a similar issue with MemoryStream or in-memory file handling when interacting with APIs expecting file uploads?
  2. Are there nuances with MultipartFormDataContent in .NET that might cause the file format to be unrecognized even with correct MIME types?
  3. Any suggestions on modifications or diagnostics tools to better understand how the API is interpreting the received data?

Thank you for any insights or suggestions you might offer. This issue has been a significant blocker, and any help would be greatly appreciated!

As it often does, it turned out to be something completely unrelated. The code is good. Its solid code so can keep up for others hitting Whisper in C# otherwise just remove.

RESOLVED

I’m running into the same issue (Unrecognized file format) with code that worked just fine a few weeks ago.

Edit:
It turned out that something changed in Manifest V3 implementation for WebExtensions and you can no longer send an arrayBuffer from content script to service worker. To send the captured audio, you first need to transform the arrayBuffer into base64 encoded string.