Using Node.js library createTranscription() function without saving a file

I am already succesfully sending a recording from my React frontend to my expressjs backend to be send to Whisper.
Right now, I am using multer to store the file in a folder on my server and use the file path in createTranscription() like this:

const storage = multer.diskStorage({
	destination: (req, file, cb) => cb(null, 'uploads/'),
	filename: (req, file, cb) => cb(null, req.clientId + '.wav')
})
const upload = multer({ storage })

async function getTranscription (path) {
	try {
		const response = await openai.createTranscription(fs.createReadStream(path), 'whisper-1')
		return response?.data?.text
	} catch (error) {
		console.log('THE ERROR:', error)
	}
}

app.use('/uploads', express.static('uploads'))

app.post('/api/upload-audio', upload.single('data'), async (req, res) => {
	const transcription = await getTranscription(req.file.path)
})

I tried to use multer.memoryStorage() and req.file.buffer so I wouldn’t have to write this file on the disk since I just need it temporarily to send it to Whisper, but I couldn’t manage to make it work. The axios always gives me 400 Bad Request status.

I tried converting the buffer to a stream (since what fs.createReadStream returns a stream as far as I understood) with many methods that I found on the web. I also asked ChatGPT for help but no luck.
I tried:

const readableInstanceStream = new Readable()
readableInstanceStream.push(buffer)
readableInstanceStream.push(null)
return readableInstanceStream
return streamifier.createReadStream(buffer)
return stream.Readable.from(buffer)

without any luck. Can anyone help me?

You’re close, I managed to get this to work by setting the readstream’s path. Looks like openai hacking around a bit under the hood

Do you guys know the solution to this issue with the new update to the API from November 2023? I’ve asked a question about this: https://community.openai.com/t/creating-readstream-from-audio-buffer-for-whisper-api/534380 although no response so far.

As of 2 Aug 2024 this is what worked for me:

import { toFile } from "openai";

async function voiceToText(buffer: NodeJS.ReadableStream) {
    const response = await openai.audio.transcriptions.create({
        model: "whisper-1",
        file: await toFile(buffer, "audio.wav"), // << here
        response_format: "text",
    });

    return response as unknown as string;
}

Unfortunately, I cannot credit the source post because I cannot insert links here…

1 Like

I did it!. Love you so much!