Introducing AIUI: A New Platform for Seamless Two-Way Verbal Communication with AI

Hello everyone!

I’m excited to share a project that I’ve been working on called AIUI.

AIUI is a platform designed to enable seamless two-way verbal communication with artificial intelligence. It aims to bridge the gap between human users and advanced AI, making it easier than ever to interact with AI in a natural, conversational manner.

To give you a better idea of what AIUI is all about, I’ve put together a short demo video:

AIUI is open-source and hosted on GitHub here: GitHub - lspahija/AIUI: AIUI is a platform enabling seamless two-way verbal communication with AI.. I’m actively seeking feedback, suggestions, and contributions from the community to help improve the platform and shape its future development.

If this interests you, I invite you to check it out, try it for yourself, and give it a star if you find it useful! Also, please feel free to share your feedback, ideas, or any issues you encounter - every bit of input helps us make AIUI better.

Looking forward to hearing your thoughts and seeing what we can build together!

Thank you!

10 Likes

I haven’t tried yet, the video looks pretty cool

  • does it handle other languages ?
  • is it “easy” to have type as well the text on screen
  • can you choose a custom voice for the assistant ?
1 Like

Thank you!

  • Languages other than English are supported when using gTTS for text-to-speech
  • Voice selection (by setting an environment variable) is supported when using ElevenLabs or EdgeTTS for text-to-speech
  • Adding text to the screen would be relatively easy and I plan on doing so, but want to make sure it’s presented elegantly in the UI

I’m interested in two way (or some way) image based communication which could make things even more seamless if anyone wants to brainstorm

Hi Luka. I am fighting with python for over a week doing just that. I had issues with many sound libraries, input and output, and knowing when to stop recording (when speech was stopped). This looks super interesting. I’ll download it and let you know of any feedback.

====
After downloading it and inspecting the code - the speech recording and sound are done in the front end, which makes sense and, I guess, much easier than doing in python.

====
Amazing! works really good! I was able to run it without much problem.
The only thing that I would try to improve is handling speech issues: I paused to think for a second and the sentence was already processed. Or if I speech while he speaks. Not sure what happens then. This is a very common issue when speaking to AI bots.
I might change the prompt and condition it if the sentence looks complete. If not - let it return some keyword. Then you will know to process the next audio and add it to the current prompt.
If you do that, the speech might be more natural even if I stop to think for a second.

====
Just one issue that you should clarify: when running the docker, the OPENAI_API_KEY parameter is without any quotes (" ') - just OPENAI_API_KEY=sk-… That took me some time to understand. Maybe if the string has quotes - you can remove them? This will save a lot of time to a lot of people…

Hi @Luka_Spahija - Awesome work and I agree that the future of interfaces would change. Your work is very relevant to a use case that I have been working on. Would love to collaborate and see how I can help contribute. Not sure how the forum works but if you can direct message me, that would be great.

Btw, I am new to this community!

would also love to help, this is usefell for my usecases also
but i am not that good with Python, let me know how i can contribute otherwise

H

I installed it reacts to my mic, but i cant get any sound out of it. Any ideas how to fix this ?

It was an improper configuration. This was essential – it works really well. I think it would be nice to have a way to stop it from auto detecting speech, especially since the free qouta is so cheap. Can’t test it too much cuz I’m broke lol.

Will plan some specific tests and use it – I have an idea of using it to interact with alexa maybe

Just one issue that you should clarify: when running the docker, the OPENAI_API_KEY parameter is without any quotes (" ') - just OPENAI_API_KEY=sk-… That took me some time to understand. Maybe if the string has quotes - you can remove them? This will save a lot of time to a lot of people…

Interesting and very helpful tool, I recommend it

So we’ll eventually have one AI unit seamlessly communicating with 10 or more diverse AI taskers?
What are we going to do if they create their own language as their intent to communicate more efficiently between them?

This is quite amazing, we’ll be able to decrease costs substantially inside the tech industry.

I wonder the amount of solutions currently out there, is there a website or catalogue or specialist who knows all of them, what AI platforms can we benefit from in our journey?This would be great for our Startup, as we currently have no CTO on board.

Can Open AI help with this, is there any AI consulting out there?

I appreciate your time,
Gastón Corbala
The Backpackster

1 Like

A very exciting project and one I’m eager to follow. I’m a non-technical community member and my initial interest is for a specific use-case. Obviously there’s thousands of use-cases for AIUI. Can anyone predict when a sandpit might be available to play in? Many thanks, I think I am @greg_twemlow

I posted a link Kruel.ai was originally started on this open source. I still to this day use parts of this the ball input/output was the best. these guys were the first to get me into ai development.

thanks Luka_Spahija

Kruel.ai V5.0 - Api companion with full understanding running 16k thanks to advanced memory system - #56 by darcschnider

To fix the pause for a seconds I built a buffer system that takes the input converts it stores and waits for 1 seconds to see if any more show up and if so append and repeat if nothing in 1 sec send for processing.

easy logic works great. If you want to take it further you can use machine learning for patterns overtime so it can learn the speakers way of talking to optimize timing which can reduce delay if you need it as close as possible to reduce over all time