I’m using the MediaRecorder API to record voice using the browser and it works well on my laptop, however, on my phone I don’t get the correct transcription.
Initially, on my iPhone recording and ending recording wasn’t doing anything, so I tried changing the audio format from audio/webm to audio/mpeg. This worked to make my app return the conversation between myself and ai, but the results are still wrong.
Sometimes it says I said “MBC 뉴스 이덕영입니다” (which oddly translates to “This is Lee Deok-young from MBC News.”) or something random like “Bye!”
I’m using next js, here’s my code for recording voice:
const startRecording = async () => {
// Request access to the user's microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
// Create a new MediaRecorder instance with the audio stream
mediaRecorder.current = new MediaRecorder(stream);
const chunks: Blob[] = [];
// Event handler to collect audio data as it becomes available
mediaRecorder.current.ondataavailable = (e) => chunks.push(e.data);
// Event handler for when recording stops
mediaRecorder.current.onstop = async () => {
// Combine all audio chunks into a single blob
const audioBlob = new Blob(chunks, { type: "audio/wav" });
// Create FormData to send the audio file to the server
const formData = new FormData();
formData.append("audio", audioBlob, "recording.wav");
// Send the audio file to the server for processing
console.log("Sending audio blob:", audioBlob);
const response = await fetch("/api/process-audio", {
method: "POST",
body: formData,
});
const data = await response.json();
and here’s my code for processing the audio:
export async function POST(request: Request) {
// Extract the audio file from the incoming request
const formData = await request.formData();
const audioFile = formData.get("audio") as File;
// Step 1: Transcribe the audio using OpenAI's Whisper model
console.log("Audio file received:", audioFile)
const transcription = await openai.audio.transcriptions.create({
file: audioFile,
model: "whisper-1",
});
console.log("Transcription result:", transcription.text);
Is this the best way for me to get audio recorded using the microphone of a user’s phone? It would be amazing to get it working as close as the chatGPT app, or to at least get the transcript right using this api. How can I resolve this issue?
Thanks!