How to retrieve transcription duration in minutes using Whisper with NodeJS and the OpenAI API?

Hello OpenAI community!

I’m working on a project using NodeJS to interact with the OpenAI API for audio transcriptions using the Whisper model. So far, everything has been going quite well, but I have a specific question.

Is there a specific way to obtain the duration in minutes of a transcription performed with Whisper? I’m looking to integrate this information into my application, and I haven’t found a clear solution in the documentation.

I appreciate any guidance or suggestions you can provide. Thanks in advance!

Welcome @gfbane23

When you say duration of transcription, do you mean timing of every segment within the transcription?

If so, you can set the response_format param in the request to "srt" or "vtt" in the boilerplate code provided in the API reference.


Hello @sps, thank you very much for responding to me, and yes, I am referring to the moment of each transcription, that is, the timestamp of the text it is generating. Something like second/minute (time) (transcribed text at that time). I will check the references you have given me, thank you very much! I couldn’t find anything in the documentation, I hadn’t checked the API reference, and I also read some examples where it was only achieved with external libraries like fluent-ffmpeg

Update, I tried the references you provided, and I was able to get what I needed. Thank you very much; sometimes, we get lost in the documentation, and the answers are right there. Thanks again. Next time, I’ll make sure to review the documentation properly.

That's what the dev forum is for. Feel free to share your project's progress or ask questions.

PS: You can mark this as solved so that fellow community members can quickly find the solution.

