Hey all —
I’ve been building an assistant using the Realtime STS API, layered on top of a tool-kernel-style architecture where features (like reminders, multi-modal outputs, journaling, to-do lists, etc.) are added as dynamic “toolpacks.”
It runs locally on a Pi, uses voice activation, and shares memory between voice, Slack, and web — so I can say “Hey Computer, email my wife and tell her I’ll be late” or “block my calendar for dinner on Friday and send me a recipe link and the grocery list to my slack channel.”
That said — I’m still trying to figure out what hands-free use cases people want outside of the obvious stuff like “read my email” or fetching the weather. Frankly those are just not good uses of token cost for the STS interface. We just convert those to TTS output or can read a web page faster.
I’d love to hear from others building or experimenting with STS agents in hands-free use cases or blended use cases where you are combining modalities (TTS; STT, STS, etc):
- What non-customer-service, personal-use voice flows are actually valuable to you?
- When does hands-free AI go from novelty → utility?
- Have you found any repeatable use cases (e.g., hands-free workflows, proactive nudges, home/lifestyle routines)?
Bonus points for weird or hyper-specific ones — I’m especially curious about patterns like:
- I talk through my day to plan it (i.e. create my task list on my drive to work)
- I have it resolve and remind me about personal commitments or habits.
Happy to share more about my work if folks are interested — just looking to learn from what others are exploring with the real-time voice stack.
— Tom
[www.razertech.com]
Building things that listen