Hi everyone,
I wanted to share with you a cost optimisation strategy I used recently when transcribing audio.
For context I have voice recordings of online meetings and I need to generate personalised material from said records. For my usecase I actually dont need the transcription to be 1:1 as after I transcribe it I process and summarise it with gpt4o-mini and continue with it.
Thats why I wrote a python script to remove silences in the audio and sped up the clip up by 120% - 200%. After the preprocessing a 20 min clip became about 3-5mins.
The transcription quality was suprisingly good. It missed some words but it caught the context of the recording well. For the speeding up I would say look for some value between 120% - 150% if transcription accuracy is very important.
Thank you for reading, I hope you found it usefull