Voice Feature in my chatting application

siddiquiowais390 · September 23, 2024, 8:31am

i am adding a voice feature in a chat application, at first i was thinking to use the whisper with the assistant. however i am thinking to attach audio with the assistant directly to reduce the latency, any thoughts on this approach?

_j · September 23, 2024, 8:42am

No API language models currently support multimodal audio input, so your approach doesn’t seem approachable at this time.

No models support doing anything productive such as pre-processing partial text input, so you also would have to send the completed thought to the AI model.

There are however techniques you could use to accelerate the whisper transcription. Foremost would be to use silence detection within audio to give it split points, where chunks of audio could be sent to whisper, either your local open-source version or that on OpenAI’s API.

This could be to run parallel transcriptions on long audio, or to transcribe while the speech input is still ongoing. The “prompt” parameter of API allows you to feed previous transcription back in as a starting point to continue on, if you were to produce a streaming transcriber.

Silence doesn’t necessarily indicate the end of a sentence, so the re-joined product may not be of the same quality, but the AI can usually tolerate and overlook a few misinterpreted words.

platypus · September 23, 2024, 10:11am

Hi @siddiquiowais390 and welcome to the community!

My gut feeling is that, since Assistants API is in general much slower and is a bottleneck, it won’t make much of a difference. But I would be interested to see your results/comparison if you do try!

Another approach, if you want to cut down transcription latency, is to use a fast local solution for that. I would recommend looking at whisper.cpp since it’s notoriously fast. But depends of course how you run your app.

Topic		Replies	Views
Whisper with Assistant API Thread API api , whisper , feature-request , threads , assistants-api	3	156	September 2, 2024
Whisper Streaming Strategy API chatgpt , whisper , streaming	5	2892	September 2, 2024
Hwo to use assistant API for conversational speeds API assistants-api , performance	3	180	July 28, 2024
ChatGPT API TTS streaming API api	2	2538	June 1, 2024
Already build your AI Assistants? I need your feedback on this integration Community project , projects , assistants , assistants-api	0	713	November 30, 2023

Voice Feature in my chatting application

Related Topics