Sending blob to Whisper api in React Native

I am trying to use Whisper API in my react native application.

The code in the reference is:

import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream("/path/to/file/audio.mp3"),
    model: "whisper-1",
  });

  console.log(transcription.text);
}
main();

but there are 2 issues here:

  1. i can’t use fs in react native
  2. i don’t have a file but a blob saved in memory

how can i successfully perform the call?

thanks

stefano

Hi Stefano,
So there is a similar library react-native-fs that could be used. However it sounds like your main challenge is getting into a readable format. I don’t have a great answer about doing that beyond saving it to the file system in one of mp3 , mp4 , mpeg , mpga , m4a , wav , and webm and then pulling the newly created file. Can you get it in one of those formats to begin with?

Thanks for your quick reply.

I am using this library: expo-av
So when i record the audio what i get is a BLOB; if i run this on the device i can retrieve an mp3 file but on web i get a blob in the form: :http://localhost:8081/ccce98fa-4b07-46a1-954c-5c179cc5289a
i would like to use this blob directly in order to use Whisper API.

let me see if i can use react-native-fs then and / or save this blob into a proper audio format.

I almost got to work like this, but then I got an error: {“error”: {“code”: null, “message”: “Unrecognized file format. Supported formats: [‘flac’, ‘m4a’, ‘mp3’, ‘mp4’, ‘mpeg’, ‘mpga’, ‘oga’, ‘ogg’, ‘wav’, ‘webm’]”, “param”: null, “type”: “invalid_request_error”}}

and my file it is a m4a:
{“exists”: true, “isDirectory”: false, “md5”: “d5a5059ac69d33e95ccc5769f3e1474b”, “modificationTime”: 1712698814.99156, “size”: 109317, “uri”: “file:///var/mobile/Containers/Data/Application/5AA38A99-FDA3-4DE3-AF0D-5B17A826BB3B/Library/Caches/ExponentExperienceData/@anonymous/medAssistant-3d9b04d3-5b20-47d4-8338-f8ecc821c836/AV/recording-C907BDC1-7249-42D0-A568-F5F221FDD541.m4a”}

async convertSpeechToText(audioUri: string): Promise<string | null> {
try {
const info = await FileSystem.getInfoAsync(audioUri);
console.log(info);

  const audioBlob = await fetch(audioUri).then((r) => r.blob());
  const audiofile = new File([audioBlob], 'audiofile', {
    type: 'audio/m4a',
  });
  console.log('m4a');

  const formData = new FormData();
  formData.append('file', audiofile);
  formData.append('model', 'whisper-1');

  const response = await fetch(this.openAIEndpoint, {
    method: 'POST',
    body: formData,
    headers: {
      Authorization: `Bearer ${this.apiKey}`,
      'Content-Type': 'multipart/form-data',
    },
  });

  const transcription = await response.json();
  console.log(transcription);

  return transcription.text;
} catch (error) {
  console.error(error);
  return null;
}

}
}

I don’t know if this will help but when I was doing this with images and the vision api, the image had to have an online url. You couldn’t use the local image location. I put the images in an S3 Bucket and it solved it for me. Maybe try that here?

Oops!! I just found a solution… this worked for me:

  try {
    const response = await FileSystem.uploadAsync(
      this.openAIEndpoint,
      audioUri,
      {
        // Optional: Additional HTTP headers to send with the request.
        headers: {
          Authorization: `Bearer ${this.apiKey}`,
          // any other headers your endpoint requires
        },

        // Options specifying how to upload the file.
        httpMethod: 'POST',
        uploadType: FileSystem.FileSystemUploadType.MULTIPART,
        fieldName: 'file', // Name of the field for the uploaded file
        mimeType: 'audio/mpeg', // MIME type of the uploading file
        parameters: {
          // Optional: Any additional parameters you want to send with the file upload
          model: 'whisper-1', // For example, if you're using OpenAI's model parameter
        },
      }
    );

    console.log(JSON.stringify(response, null, 4));
4 Likes

Signed up just to say thank you @RealBrubru.

1 Like

you are my savior!!! thankssssssss :open_hands:

I ran into this problem too! For anyone else who ends up here like I did, I created an npm package called openai-react-native that handles file uploads with the Expo file system and chat completion streaming using React Native SSE, all while using the official Node SDK types and API wherever possible

1 Like

This solve the problem:

        const audiofile = new File([audioBlob], 'audiofile', {
            type: 'audio/wav',
        });

        const transcription = await this.openai.audio.transcriptions.create({
            file: audiofile,
            model: "whisper-1",
            response_format: "text",
        });

This would be very welcomed in the documentation.

omg. You saved my bacon.
Thank you for this! Been looking for ages!

I’ve implemented RealBrudru’s solution, but keen to try this one too.
Thanks!