Hello! I am working on building a website where a user can record themselves and obtain a transcription of the recording using the Whisper API. The recordings seem to be working fine, as the files are intelligible after they are processed, but when I feed them into the API, only the first few seconds of transcription are returned. I’m not sure why this is happening and it seems like other discussions about this issue never reached a solution. Here’s my Flask code:
@app.route('/transcribe_audio', methods=['POST'])
def transcribe_audio():
try:
file = request.files['audio']
audio_data = file.read()
audio = open("backend/audios/audio.mp4", "wb")
audio.write(audio_data)
audio.close()
# Make a request to the Whisper API for transcription using the OpenAI Python library
audio_file = open("backend/audios/audio.mp4", "rb")
response = openai.Audio.transcribe("whisper-1", audio_file)
print(response)
# Extract the transcribed text from the response
transcribed_text = response['text']
return jsonify({'transcribed_text': transcribed_text}), 200
except Exception as e:
print(f"Error processing audio: {str(e)}")
return jsonify({'error_message': str(e)}), 500
Has anyone run into this before and/or knows how to fix this? I’m wondering if there’s something funky with the file encoding or something.
This seems identical to an issue another forum member posted a couple of hours ago, can I ask if you are using an Apple product to do the recording or any audio manipulation?
Update: this is definitely an API-related bug, as I just tried using the Github version of Whisper in my web app and it worked perfectly. Hopefully this gets resolved soon as I’d much rather use the official API!