One of the first real use cases I’m trying to get right here is a reading assistant for children.
When founding something getting traction and staying grounded in the real world can be challenging, particularly in a space like AI where there are so many high balls flying overhead.
One of the Dad jobs I found myself doing was helping my daughter to read, specifically calling back words she found challenging to figure out for herself. She’s very gifted creatively but a late reader (just like me). She’d call out the letters and I’d tell her the word.
Doing this right about the time I was also exploring the realtime voice API. Basically you get an ephemeral token via server then establish a direct WebRTC connection for 2 way voice.
This has been an interesting use case to build. And I have my first user! (hey if some can claim Mom as a user then I can claim my daughter).
Technically this is interesting from a latency standpoint. This is much easier to get going then doing server based chunking for streaming. And for the first time I’m feeling the agents I build are truly interactive. Honestly even with chat completions (which is fastest, ~10 seconds to get an answer won’t work).
Problems, issues? this is certainly fun, I have to add more wiring to pop-up letters as spoken, and the word at end. That to showcase where I think the niche for this platform will be which is “chatbot does real stuff”.
A link is at end to see screenshots and use the live system. At some point if there’s volume I’ll have to put this behind user/login.
The only issue I’ve come across with this API is that it exposes the prompt (instructions field). For some that’s a deal-breaker but I honestly feel if all you have for tech is your prompt then there isn’t much barrier to entry anyhow. The real differentiators will be building things much more complex, so less worry about leaking the prompt.
The progress in this space still kind of blows my mind.
-J
