Hello! I am working on building a website where a user can record themselves and obtain a transcription of the recording using the Whisper API. The recordings seem to be working fine, as the files are intelligible after they are processed, but when I feed them into the API, only the first few seconds of transcription are returned. I’m not sure why this is happening and it seems like other discussions about this issue never reached a solution. Here’s my Flask code:
@app.route('/transcribe_audio', methods=['POST'])
def transcribe_audio():
try:
file = request.files['audio']
audio_data = file.read()
audio = open("backend/audios/audio.mp4", "wb")
audio.write(audio_data)
audio.close()
# Make a request to the Whisper API for transcription using the OpenAI Python library
audio_file = open("backend/audios/audio.mp4", "rb")
response = openai.Audio.transcribe("whisper-1", audio_file)
print(response)
# Extract the transcribed text from the response
transcribed_text = response['text']
return jsonify({'transcribed_text': transcribed_text}), 200
except Exception as e:
print(f"Error processing audio: {str(e)}")
return jsonify({'error_message': str(e)}), 500
Has anyone run into this before and/or knows how to fix this? I’m wondering if there’s something funky with the file encoding or something.