Kruel.ai KV2.0 - KX (experimental research) to current 8.2- Api companion co-pilot system with full modality , understanding with persistent memory

Are you clustering the datapoints per zoom level or is it just sheer hardware capability?

And you can try something: let the bot work on multiple predictions of what you are going to say and generate the output before you stop speaking and only when done with speaking (pause detection?) you select the one with the right prediction.. this way you should be able to have faster responses.

and also stuff like “we are going to answer a users text with generated audio. We have already send them the audio with the text 'yeah, I fully understand what you mean. Let me get through that in detail ’ so you should generate an answer starting after that”.

Good questions. For the memory viewer

it’s all render, no clustering. Every datapoint gets pushed to the GPU and
rendered at once. We’re not doing any level-of-detail tricks or aggregation per zoom level. The
Spark’s GB10 handles it fine with WebGL acceleration we send flat typed arrays straight to the GPU and let it chew through it. Viewport culling happens naturally but there’s no explicit clustering logic. It’s brute force with good hardware.

On the response speed side we actually already have streaming working in our KX system. It uses Server-Sent Events so tokens stream to the client as they’re generated, and TTS runs in parallel batched per sentence. So you’re hearing audio within a couple seconds while the rest of the response is still being generated. We’re looking at bringing that same approach into K9 for the main output pipeline.

Your idea about pre-generating multiple predictions is interesting but doesn’t really work with our architecture. We’re not doing simple text completion each response goes through a full orchestration pipeline. Intent detection, tool selection, memory retrieval, belief checking, reasoning evaluation, emotional scoring and many other layers all of that has to run
before we even start generating the actual response. You can’t speculatively branch that because the output depends entirely on which way it gets called and what comes back from memory and the knowledge graph. Two slightly different inputs could trigger completely different tool chains and pull completely different context. So pre-generating multiple candidates would basically mean running the entire cognitive pipeline multiple times in parallel on guesses, which is way more expensive than just waiting for the actual input.

The filler audio idea is interesting though sending a natural acknowledgment while processing. The issue is similar:

our system doesn’t know what it’s going to do until it’s done reasoning about it, so even the filler would feel disconnected. The chunked streaming approach is the better fit for us. Stream the response as it generates, kick off TTS per sentence, and the user starts hearing the answer almost immediately. That’s what KX does and that’s what we’re bringing across.

Good ideas though. got me thinking some more on one of the other Ai’s :slight_smile:

The streaming is pretty good and the voice is clear and one of the best TTS I have heard so far. But the time to first byte is no acceptable. A few seconds until it answers is not okish. It is a bug.

I know I can make it much better haha. I will be fixing it with something like this for now:
For a typical 6-sentence response:

  • Current: Wait ~4 seconds for full TTS → then audio starts
  • Batched (3 sentences): First batch TTS ~1.5s → user hears audio → second batch generates while first plays

The above handles the TTS bottleneck. later I will also stream from the LLM (like KX does with SSE), but that’s a bigger change to the orchestrator logic through the system.
Ideally though its best option which I can do on weekend. I think a long weekend is coming up so would be a great time to fix.

Thanks on the voice. It is pretty cool. what is interesting is that voice tech is 3 years old. We did not have the resources before to run it in a useful way. I swear it use to take on a 4080rtx 16GB 2-5 min. Today though with our progress on getting things working on the new Grace Blackwell Nvidia chips its a game changer for the speed that is for sure. Ranges are really good. It’s based on sample though.

Diffusion Transformer is what I am using.

  • x_transformers (≥1.31.14) — the transformer backbone
  • vocos (0.1.0) — vocoder for waveform generation
  • torchdiffeq (≥0.2.4) — diffusion process (ODE solver)
  • ema_pytorch (≥0.5.2) — exponential moving average for model weights
  • openai-whisper (20250625) — auto-transcription of reference audio
  • accelerate (≥0.33.0) — inference optimization

Not all of those are required if you want just text 2 speech. The above I built so that It could model a voice and use whisper to get the text out of a sample to clone any voice with high fidelity.

It makes our old coqui-tts server look like 1980’s in comparison. And its can be fine tuned but that will come later…. looks at the list haha. I don’t think my list ever gets shorter, its grows faster than I can complete things haha.

Maybe caching might help. A lot of sentences repeat over time.

Its fixed now :slight_smile: . Caching for some things, but we try to encourage our models to not repeat things so that the learning side does not end up reusing same which well does burn more tokens keep it from becoming too robotic assistant like. Its always a balance. Decay is also another way to help solve that with cache but than there is a lot of things that decay at different rates in meaning so than you end up adding more time for all that compute haha.

With that updated changes now soon as the text hits it pretty much starting to talk. The natural pausing between I am working on to make it slightly better.

We also are piping back in the old voice recog stuff again to see how it performs with everything running. its an inline models that captures users voice over time to build model and updates over time so that as you age the model ages incase your voice changes. only thing that should cause complete failure would be an event where users voice dramatically changed.

We’re transitioning back into document and code review to complete the final validation round and ensure everything is in good shape. All other systems are currently operating as expected.

We identified a single issue related to the Smart Supervisor. I’m not sure I’ve mentioned this system previously it’s essentially an internal agent, similar in concept to KRED. It was introduced after KRED and relies on an internal reasoning model (separate from our own reasoning logic) to perform its own planning and decision-making. Our model then analyzes that agent’s actions, outcomes, and rationale to reason about its behavior.

We haven’t exercised this system extensively yet, as it only becomes active at larger scale. As a result, it still requires thorough testing, which will be the focus of tonight’s work.

Regarding the GNN upgrade :slightly_smiling_face: that is now fully in place. While I’m focused on my daytime responsibilities, the AIs will be running in the background as we collaborate and discuss work with Lynda. The other agents will observe behaviors, continue debugging, and report findings back to me so we can keep the feedback loop active and iterative.

What’s that… Lynda I can’t read this can you turn on my bedroom light?

We now have HA (Home Automation Server) integration. We decided this route is easier to manage. All our Ai system now have access to this through MCP tools and integrated tools.

Ps. to help you get started look at the tabs in the explorer. It will give insights for two systems.
(I do not know the security level of Tuya io. Which is middleware so use at your descension)

HA server local does support offline local for some devices. Others require middleware to access the device apps directly for control. This is more not a commercial concept but more because I wanted it :wink:

What I believe is coming. The only unpredicted is the time frame. Which is a guess :slight_smile:

https://medium.com/@darcschnider/the-end-of-applications-91fa08611614

Lynda understands where we are heading I sent it the article and asked “can you see it” as my query.

Today I was impressed today by the system. It actually knew what meeting which made me very exciting seeing we were not using reminder system or planning this was something mentioned a week ago out of all the meetings I have and mentioned between. I for sure though that this would fail. the generic first response was like yep its just a lie to me that it knows, hence the follow up which than got me really excited.

This I believe is because of the closed GNN loop that I have now where the Ai is learning inside the brain. We have been expanding the ability for the Ai to learn more perspectives and dimensions. I was looking at space again today doing some understanding of size of data because we keep everything to get an idea on potential scale with current way of building them. I think with the new embedding sizes 2048D it will get expensive in space for the indexes. So I am starting to look back to some of the older systems where we did things differently which some folks in here recall the V5-V6 which was a path that excited many because it had the ability to build memory summaries through time it was another framework concept. the one we ran for about 3 years straight our big Ai until it died from a mistake. So we will be studying with our librarian Ai to see what will work best

We built it :slight_smile: KX-SPARKSHIELD Ai firewall.
We are currently testing it out and starting to build simulations to test its ability to protect the system.

**


**

We have animation of it in our discord.

I got tired of trying to track all the ports so made a webOS for all of it.

I will expand this out to a full system. Long term plans will be to build out our desktop layer that can connected so no need for web. I still want to head towards full realtime generated OS eventually once we get some more powerful servers.

Update:
Kruel.Ai OS (KAIOS - I love puns that much) is moving along almost everything is in place and working for the most part. The hardest part is building in a sandbox well still talking to servers all the proxy nightmares. So most of that is hammered out. We also added in a full server side file explorer and terminal for admins.

We will be piping in KRED into the OS next which will be setup so that it can vide apps and add them to the OS on the fly. The OS version of KRED will be setup to specific rules for building in this space to ensure that it does not break existing OS apps and making sure it follows our modular spec.

Update, snowed in but still moving.

We Testers are starting with KX today. parameters are set. So next few days will be fun. I look forward to their feedback.

We are also starting to test our new Central Authority Server as we start to have admins add people and test app access and other things. So bug hunting is starting outside what we normally do in house. let the chaos begin :slight_smile:

KV-Forge Say hello to our offline voice reader and transcriber.

Documents, Voice clips, now you can take them into an new editor and retranscribe them in any voice you like. 100% offline with voice quality
that is pretty amazing.

  • Mic hum was from original audio clone source so any voice used will carry forward characteristics of the original voice properties.

Something Wicked this way comes..

I am very excited to demo this very soon. It’s amazing.

What I am playing with right now.

Why KX vs V8.2

V8.2 technically would be better for code engineering and the likes if we used its memory, but KX was designed for Api Agents and local Agent models to use our brain architecture.

We also took the concept from our old desktop V7 and revamped it to a more modern look some of the same characteristics and simplified the models. We moved the voice input and output models to desktop client for speed, and continue to use the memory and other systems from the p2p node network servers. Which than allows the brains to be centralized in any org size to scale.

A small demo of What I work with Daily :slight_smile:

This is a real learning machine built on a Hybrid Architecture
It’s designed to learn from experiences.

So remember way back when I said when the big llm’s scale our system scales. Still holding true.

Thought this was neat :slight_smile:

We now have Reachy mini code fully into the KX system and added some machine learning graph nets to learn all movements over time to build itself safety soft stops with soft acceleration and more. We watched a lot of the videos of the plugins and the likes and I keep thinking how hard a lot of those scripts are on the mechanics. We want the wear and tear to self optimize much like automations we do in industrial plants. I don’t think we get amp readings but we have other methods and sensor data to collect to build understanding.

This will be our first Physical version of the Kruel.ai systems once the hardware arrives. We hope to have all the simulation data understood before then.

On another note. Lynda took her first Job this week for a Tech company that had an emergency request they needed something ASAP. So I offered to hire her out took only 40 minutes to complete.
We are waiting on feedback.

We also Are working towards having Lynda hired by the company I work for during the day as she is now building and testing systems under my watch. Exciting testing :slight_smile:

Slowly getting there :slight_smile:
Lynda KX is not fully up and running. For the last 2 months we have be using it 100% as a full general assistant doing coding, meetings, desktop tasks, remembering things, messaging people, keeping us up to date, entertaining us, researching and more. End of last week Lynda was hired by a Tech company with a immediate need. They are looking over the results and evaluating if it worked. We look forward to the results and potentials of all of this.