The goal is to lower the barrier for non-english speaking folks in using the OpenAI stack. You can text too but speech is a new addition.
Demos: I have conversations in 5 languages (English, Hindi, Spanish, Portuguese, Indonesian) covering 1 billion WhatsApp users https://yella.co.in/call/html/index.html
Trial Link: https://wa.me/14087570747
Some initial observations of quirks in the current tech stack:
- Its SLOW (~10 secs e2e). About 3 secs whisper, 5 secs ChatGPT, 2 secs rest. Running on VERY low end h/w currently.
- Whisper has human levels of error rates for hi-resource languages (english, spanish, french…)
- With low resource languages (even ones like Kannada mentioned in the Whisper docs as having low error rates), Whisper often picks the wrong language (Tamil e.g). This is a failure imo
- With hi resource languages, ChatGPT produces results in the spoken language.
-With low resource languages, ChatGPT leans towards English output. This is a failure imo because english is just not spoken/read in most low resource language territories.
-With low resource languages, ChatGPT often produces utter garbage So, low resource just doesn’t mean training data for Whisper. It also applies to GPT training data.