I got access to the Realtime API this morning.
Playing around with it in the playground it.
Seems like the model is pretty nerfed vs Advanced Voice Mode. Has limited voice models that has a lot less range than AVM in pronounciation, tonality and accents.
You might be right with the accents. I seem to have no problem in AVM (Indian accent) – but the realtime API seems to be having issues with the accent (atleast for me) – this is in preliminary testing.
Beyond this: This Realtime API is amazing … (just created a POC for Realtime API + RAG) – this is going to change a lot of things (like data centers)
Yep, based in Europe. Asked it to switch to Hungarian. AVM is surprisingly good at it, standard voice mode is terrible. This was A LOT closer to SVM.
Haven’t yet checked RAG, but did play around with function calling. Pricing is crazy so will need to do a lot of cost optimization in architecture to make it worthwhile.
Using tools (function calling) to get the context from the RAG (Using our own RAG platform) – that part is working great … It’s the UX challenges that seem to be a pain.
I asked it to respond in Hindi and it did a pretty good job with the language and the accent. (The only odd thing was an American voice talking in Hindi – but beyond that, I was quite surprised that it got the pronunciations pretty good)
RAG works great with function calling – got it working well … the only problem is controlling hallucinations (so you will need to test and control extensively for that).