We have completely rebuilt the K7 messaging application as well as the voice input/output application. The original monolithic codebase was divided into approximately ten modular files, which we have been updating and testing systematically to ensure seamless functionality across the split components.
While working on the TTS (Text-to-Speech) code, we identified an opportunity to refine some aspects of voice processing. This led us into an extensive exploration of voice cloning from reference .wav files. The functionality works effectively, though the current generation process is somewhat slow due to limited GPU memory, which restricts faster compilation. However, with the new hardware we are awaiting, this process will soon be nearly real-time.
Once optimized, the AI system will be capable of taking one or two voice samples and generating responses in that voice. This advancement enables us to integrate a feature where users can upload voice samples, test the results, and save the voice with a custom label in a selectable menu for future use.
This than offers an offline voice that is almost as good as most that out there. We will chase this even further down the road with ability to take lots of samples and perhaps build a full model rather than using the wav everytime to build a new voice embedding.
The new smart-memory.py we have been also contiously using, pretty much using the openai model with it.
if you look at the costs for running it still very affordable
It’s nice to have Cached inputs saves some money
Still love how optimized the system is compared to back where we were spending $900/month with V2. if you look back in time through this thread or on our discord server you can see that the key to reducing costs was changing to mathematical understanding rather than all textual searching.