I’m excited to share a project that I’ve been working on called AIUI.
AIUI is a platform designed to enable seamless two-way verbal communication with artificial intelligence. It aims to bridge the gap between human users and advanced AI, making it easier than ever to interact with AI in a natural, conversational manner.
To give you a better idea of what AIUI is all about, I’ve put together a short demo video:
If this interests you, I invite you to check it out, try it for yourself, and give it a star if you find it useful! Also, please feel free to share your feedback, ideas, or any issues you encounter - every bit of input helps us make AIUI better.
Looking forward to hearing your thoughts and seeing what we can build together!
Hi Luka. I am fighting with python for over a week doing just that. I had issues with many sound libraries, input and output, and knowing when to stop recording (when speech was stopped). This looks super interesting. I’ll download it and let you know of any feedback.
====
After downloading it and inspecting the code - the speech recording and sound are done in the front end, which makes sense and, I guess, much easier than doing in python.
====
Amazing! works really good! I was able to run it without much problem.
The only thing that I would try to improve is handling speech issues: I paused to think for a second and the sentence was already processed. Or if I speech while he speaks. Not sure what happens then. This is a very common issue when speaking to AI bots.
I might change the prompt and condition it if the sentence looks complete. If not - let it return some keyword. Then you will know to process the next audio and add it to the current prompt.
If you do that, the speech might be more natural even if I stop to think for a second.
====
Just one issue that you should clarify: when running the docker, the OPENAI_API_KEY parameter is without any quotes (" ') - just OPENAI_API_KEY=sk-… That took me some time to understand. Maybe if the string has quotes - you can remove them? This will save a lot of time to a lot of people…
Hi @Luka_Spahija - Awesome work and I agree that the future of interfaces would change. Your work is very relevant to a use case that I have been working on. Would love to collaborate and see how I can help contribute. Not sure how the forum works but if you can direct message me, that would be great.
It was an improper configuration. This was essential – it works really well. I think it would be nice to have a way to stop it from auto detecting speech, especially since the free qouta is so cheap. Can’t test it too much cuz I’m broke lol.
Will plan some specific tests and use it – I have an idea of using it to interact with alexa maybe
Just one issue that you should clarify: when running the docker, the OPENAI_API_KEY parameter is without any quotes (" ') - just OPENAI_API_KEY=sk-… That took me some time to understand. Maybe if the string has quotes - you can remove them? This will save a lot of time to a lot of people…
So we’ll eventually have one AI unit seamlessly communicating with 10 or more diverse AI taskers?
What are we going to do if they create their own language as their intent to communicate more efficiently between them?
This is quite amazing, we’ll be able to decrease costs substantially inside the tech industry.
I wonder the amount of solutions currently out there, is there a website or catalogue or specialist who knows all of them, what AI platforms can we benefit from in our journey?This would be great for our Startup, as we currently have no CTO on board.
Can Open AI help with this, is there any AI consulting out there?
I appreciate your time,
Gastón Corbala
The Backpackster
A very exciting project and one I’m eager to follow. I’m a non-technical community member and my initial interest is for a specific use-case. Obviously there’s thousands of use-cases for AIUI. Can anyone predict when a sandpit might be available to play in? Many thanks, I think I am @greg_twemlow
I posted a link Kruel.ai was originally started on this open source. I still to this day use parts of this the ball input/output was the best. these guys were the first to get me into ai development.
To fix the pause for a seconds I built a buffer system that takes the input converts it stores and waits for 1 seconds to see if any more show up and if so append and repeat if nothing in 1 sec send for processing.
easy logic works great. If you want to take it further you can use machine learning for patterns overtime so it can learn the speakers way of talking to optimize timing which can reduce delay if you need it as close as possible to reduce over all time
I wanted to try it out, but unfortunately for someone who doesn’t have tech knowledge it is not easy to use. I could never remember any codes to be typed…