Is it possible to interact with the assistant API using ones voice similarly to the app?

I’m trying to implement the voice feature that is offered on the app that allows users to engage with the assistant using their voice, but I don’t see how on the Docs. Can somebody point me in the right direction? I’m working with nodeJS.

You would have to use Whisper to transcribe audio real-time and pass to your Assistant as input

1 Like

As of now Assistants does not come built-in with this feature.

There are however some very cool advancements using open-source technology:


Thanks! I will try to implement the feature asap.

I’m working on this! Like a realtime voice chat conversation with assistant through the api. Will report here once I have it done in the next 2 weeks!

1 Like

I am trying to do the same! :smile:

You can do the same.

Audio can be recorded. Then transcribed to an AI text input with API Whisper. However, when you think of chat, you probably expect it is always listening and understands when you are done.

It is a lot easier to have a start and done/send button for recording.

Home Assistants like Alexa have firmware that is always listening for a wake-up word. It also has some smarts of adaptive voice silence detection to know when you are done talking. It also has output cancellation so it can hear “alexa quiet” over its music or recitation. So “hey google” takes a lot of work to recreate.

Not part of the AI API.

Chat models also like to write answers that take a tedious minute or more of text-to-speech reading, which seems very non-conversational without extensive prompting.

I came across this GitHub, it is a decent start, it does the TTS & STT what I want, I’m going to fork it and modify for my needs.


I’m going to add the plan text to the screen so you can also see what the response is, a history of the chat, maybe even a file upload option.

I have custom assistants setup and configured with documents and retrieval enabled along with code interpreter. But it would be nice to add the plan text along with the voice assistant, that way you can export the conversation for different use cases. I did some testing last night and was very impressed, and it is quite cheap, I racked up maybe $0.17. for my purposes this is extremely cheap compared to assigning someone to find the answers I’m looking for in the given data, and much faster!