Hi all,
Has anyone found a good way to get the realtime API to accurately recognize narrowband audio? 8khz mulaw?
I have not been able to manage to get the api to successfully recognize speech correctly. I would say it is pretty accurate as a whole, but struggles with 1-2 word answers to questions and it does not work well at all with names or addresses
The playground is much better, but the model is also getting hi fidelity audio from there so I would expect this.
I’m using Java, but anyone managing to do this at all would be very helpful
Thanks!
If the playground works better then your issue most likely lies with Twilio, and/or the way you’re processing the audio.
I’d recommend trying other providers and seeing how they fare.
1 Like
The playground uses high quality audio in wide band (likely through webrtc) which is not possible over phone lines
Ah, sorry.
Still,
You can save a sample and run it through numerous services to see how it fares. You could probably also use some sort of pipeline to improve the quality of the audio before sending it off.
For if youre curious / anyone else approaching this. I am now attempting to use this neural net with a bunch of these: voice datasets to upscale the audio in realtime and feed that to the assistant. I’ll update how it works . I don’t think its a commonly faced issue in voice since most people are going phone → phone and therefore do not care to upscale, but in our case we can restore the audio to pcm 16khz and that is much higher fidelity audio. This is a first thought though, hopefully it works