Is it possible to interact with the assistant API using ones voice similarly to the app?

jonathang2560 · February 1, 2024, 5:26pm

I’m trying to implement the voice feature that is offered on the app that allows users to engage with the assistant using their voice, but I don’t see how on the Docs. Can somebody point me in the right direction? I’m working with nodeJS.

matcha72 · February 1, 2024, 5:40pm

You would have to use Whisper to transcribe audio real-time and pass to your Assistant as input

anon10827405 · February 1, 2024, 5:41pm

As of now Assistants does not come built-in with this feature.

There are however some very cool advancements using open-source technology:

jonathang2560 · February 2, 2024, 6:06am

Thanks! I will try to implement the feature asap.

CinematicDev · February 2, 2024, 6:32am

I’m working on this! Like a realtime voice chat conversation with assistant through the api. Will report here once I have it done in the next 2 weeks!

rockettpc · February 11, 2024, 8:07am

I am trying to do the same!

_j · February 11, 2024, 9:24am

You can do the same.

Audio can be recorded. Then transcribed to an AI text input with API Whisper. However, when you think of chat, you probably expect it is always listening and understands when you are done.

It is a lot easier to have a start and done/send button for recording.

Home Assistants like Alexa have firmware that is always listening for a wake-up word. It also has some smarts of adaptive voice silence detection to know when you are done talking. It also has output cancellation so it can hear “alexa quiet” over its music or recitation. So “hey google” takes a lot of work to recreate.

Not part of the AI API.

Chat models also like to write answers that take a tedious minute or more of text-to-speech reading, which seems very non-conversational without extensive prompting.

rockettpc · February 11, 2024, 8:18pm

I came across this GitHub, it is a decent start, it does the TTS & STT what I want, I’m going to fork it and modify for my needs.

sohail8611/voice_assistant_with_openai

I’m going to add the plan text to the screen so you can also see what the response is, a history of the chat, maybe even a file upload option.

I have custom assistants setup and configured with documents and retrieval enabled along with code interpreter. But it would be nice to add the plan text along with the voice assistant, that way you can export the conversation for different use cases. I did some testing last night and was very impressed, and it is quite cheap, I racked up maybe $0.17. for my purposes this is extremely cheap compared to assigning someone to find the answers I’m looking for in the given data, and much faster!

wildplanting · August 18, 2024, 5:01pm

Sorry to bump a thread but it’s relevant to my project; have there been any updates on this?

Topic		Replies	Views
Voice Feature in my chatting application API assistants-api	2	165	September 23, 2024
Add text-to-speech from Assistant API API assistants-api	3	1424	March 3, 2024
Whisper with Assistant API Thread API api , whisper , feature-request , threads , assistants-api	3	374	September 2, 2024
GPTs with Custom Actions by Whisper API and TTS Feedback gpts	18	6722	December 4, 2023
Real time api usable with the assistant API? API api , assistants-api	8	2138	March 19, 2025

Is it possible to interact with the assistant API using ones voice similarly to the app?

Related topics