Hi there!
I have incurred in a very frustrating issue: even though this transcription API generally works perfectly, and the downloaded audio file is always intelligible
@router.post("/transcribe/{lang}")
async def transcribe_route(
lang: str, file: UploadFile = File(...), duration: float = Form(...)
):
"""User sends an WebM file and Whisper API converts it to text"""
print(f"lasted {duration} seconds")
try:
# Check if the file is a webm file
print(file.content_type)
if "audio/wav" not in file.content_type:
raise HTTPException(status_code=400, detail="Only .webm files are supported.")
# Read the audio file data
recording_content = await file.read()
recording = io.BytesIO(recording_content)
recording.name = file.filename
# Save the audio file locally
save_path = f"{duration}-{file.filename}" # Use a unique filename
with open(save_path, "wb") as audio_file:
audio_file.write(recording_content)
# call the whisper API
transcription = openai.Audio.transcribe(
"whisper-1", recording, language=languages_mapping.get(lang, "en")
)
# returns the transcribed text
return transcription["text"]
For some reason when I send an audio recorded on iOS whisper is only able to transcribe the first 1-2 seconds.
I think this may be caused by the different encoding made on iOS, but there seems to be no way of fixing it client-side.
What can I do to solve it?
Thanks in advance,
Giovanni