Error with whisper when trying to get transcription using java HTTP post & streaming audio file

I’ve been trying to write a server endpoint that receives an audio file’s binary data as an InputStream and uses it to call the transcriptions endpoint but have been running into various issues through the entire process. Ive worked through most all issues, but now i’m consistently running into this error :

{
“error”: {
“message”: “Invalid file format. Supported formats: [‘flac’, ‘m4a’, ‘mp3’, ‘mp4’, ‘mpeg’, ‘mpga’, ‘oga’, ‘ogg’, ‘wav’, ‘webm’]”,
“type”: “invalid_request_error”,
“param”: null,
“code”: null
}
}

I’ve played around with various file types but always receive this error. I’ve looked at every thread here and elsewhere I can find on that matter but can’t figure out the issue. Looking for guidance on how I can triage / fix this issue. Any help is appreciated!

I’ve tried using mp3, webm, and wav files to the request. I’ve gotten the endpoint to work when using the example in the docs :

curl --request POST \
  --url https://api.openai.com/v1/audio/transcriptions \
  --header 'Authorization: Bearer TOKEN' \
  --header 'Content-Type: multipart/form-data' \
  --form file=@/path/to/file/openai.mp3 \
  --form model=whisper-1

But in my case I wont have the audio file stored on the local machine, so I can’t follow the same format.

To test my endpoint ive been sending requests like this. Generally matching what I would expect client requests to look like :

curl 'MY_ENDPOINT_URI' \
  -H 'content-type: multipart/form-data; boundary=----keykeykey' \
  --data-raw $'------keykeykey\r\nContent-Disposition: form-data; name="file"; filename="audio.webm"\r\nContent-Type: audio/webm\r\n\r\n' \
  --data-binary @audio.webm \
  --data-raw $'\r\n------keykeykey\r\nContent-Disposition: form-data; name="filesize"\r\nContent-Type: text/plain\r\n\r\n186406' \
  --data-raw $'\r\n------keykeykey\r\nContent-Disposition: form-data; name="model"\r\nContent-Type: text/plain\r\n\r\nwhisper-1\r\n------keykeykey--\r\n' \
  --compressed

Then, Im routing that request to my API which looks like this :

    // Create an HttpClient to execute the request were about to build.
    try (final CloseableHttpClient httpClient = HttpClients.createDefault()) {
      // Create a post request to the open AI transcriptions endpoint.
      final HttpPost request = new HttpPost("https://api.openai.com/v1/audio/transcriptions");

      // Fetch our API key.
      request.addHeader("Authorization", "Bearer " + getAPIKey());

      // The file were transcribing is stored within an InputStream we recieved from our endpoint request. In order to let open ai know how to
      // parse our byte stream, we need to build this multipart entity.
      request.setEntity(MultipartEntityBuilder.create()
                                              .setContentType(ContentType.MULTIPART_FORM_DATA)
                                              .addPart(OPEN_AI_FILE_KEY, new OpenAIFileContentBody(inputStream, fileName, Long.valueOf(fileSize)))
                                              .addPart(OPEN_AI_MODEL_KEY, new StringBody(model, ContentType.DEFAULT_TEXT))
                                              .build());

      // Execute our request, then retrieve the response, and return output as a JSON String.
      try (final CloseableHttpResponse response = httpClient.execute(request);
           final ByteArrayOutputStream output = new ByteArrayOutputStream()) {
        ... get response & return ...
      }