I am planning to use open ai api in the aws lambda function.
The expected work is:
I have a subtitle file, which is translated by voice to text and it have some mistakes like punctuations and some word list which are incorrect.
By requesting the open ai api I want to correct those punctuations and word lists, btw the word list should also be provided manually.
As I am new to this , I don’t know to get started, if there is anyone to help it will be appreciated. Thank you!
Maybe look at using something like Whisper for transcription that has a much lower word error rate?
Calling from Lambda is easy, you can even skip the OpenAI SDK and just do requests directly to the API from Lambda.
Thank you for the advice but for speech to text GCP is used already
I would switch. It would solve your first problem. And you wouldn’t even need AI for cleanup, so it also solves your second problem. It’s a simpler design without the AI cleanup.
I had AWS Transcribe forever, and the word error rates were atrocious.
Then if that doesn’t work, especially after prompting the model for better transcriptions, then you can try the AI cleanup.
Thank you for the reply. I will surely try that.
@curt.kennedy actually I needed transcription for japanese language. so, is whisper better in that case.
Thank you, it will be very helpful
Is whisper is better in japanese language transcription also?
I have no idea about Japanese and Whisper quality. You’d have to try it and see.
“Better” is subjective. Better than software developed in Japan?
I meant is transcription of whisper in japanese is as good as it of in english?
Does the UK prime minister speak English better than the Japanese prime minister speaks Japanese?
Subjectively, hard to compare from an individual standpoint unless you are native-level fluent and literate in both.
Whisper large-v2 scores with a slightly lower Japanese word error rate on CommonVoice9, but a higher error rate on Fleurs in the paper “Robust Speech Recognition via Large-Scale Weak Supervision”
It also might depend on if you are sampling Japanese gangster movies and needing accurate Kanji, or trying to understand Scottish English.
I think you misunderstood it, I wanted to use speech-to-text application for japanese video, so i was asking which will be good to use.