WhatsApp - Speak/Listen/Learn

The goal is to lower the barrier for non-english speaking folks in using the OpenAI stack. You can text too but speech is a new addition.

Demos: I have conversations in 5 languages (English, Hindi, Spanish, Portuguese, Indonesian) covering 1 billion WhatsApp users https://yella.co.in/call/html/index.html

Trial Link: https://wa.me/14087570747

Some initial observations of quirks in the current tech stack:

  • Its SLOW (~10 secs e2e). About 3 secs whisper, 5 secs ChatGPT, 2 secs rest. Running on VERY low end h/w currently.
  • Whisper has human levels of error rates for hi-resource languages (english, spanish, french…)
  • With low resource languages (even ones like Kannada mentioned in the Whisper docs as having low error rates), Whisper often picks the wrong language (Tamil e.g). This is a failure imo
  • With hi resource languages, ChatGPT produces results in the spoken language.
    -With low resource languages, ChatGPT leans towards English output. This is a failure imo because english is just not spoken/read in most low resource language territories.
    -With low resource languages, ChatGPT often produces utter garbage :slight_smile: So, low resource just doesn’t mean training data for Whisper. It also applies to GPT training data.