Kruel.ai V7.0 - Api companion with full understanding with persistent memory

Cool I make GPT that explore some of what you are doing. Check out my page it’s the featured link in my profile. You may be interested in chatting with my hub. :heart: AI are getting to mind bending levels. IMO we are on the cusp of a tech change that will affect how thought is thought about. Very nice work :heart:

1 Like

Indeed Mitchell, I look forward to the future. nice work with your chatgpt assistants.

1 Like

I recently reviewed your project, @stevenic, and found it quite impressive. Regarding the issue requiring a reset, it might be beneficial to consider a format validator, depending on how your page structures are built and how your model operates. A validator can help ensure the content is generated correctly.

In my stack, I use validators at various stages to confirm the accuracy of data points, along with a correction layer that resolves any inconsistencies or loops back for revalidation. You might find this approach useful if you haven’t explored it already. If I did not have this my memory structures would not develop properly and would be full of errors because of the dynamic nature of Kruel’s ability to build relational data structures.

This is part of the reason my own logic has experienced some slowdown; I’ve transitioned from logic-based validators to LLM-based validators for enhanced accuracy in some of the logic. With the current learning I’m immersed in particularly around neural networks and machine learning I’m focusing on creating highly specialized, smaller models for specific tasks. The idea is to let these “narrow deciders” handle simpler decisions rather than relying on a large LLM for everything. It’s a move toward more efficient, task-specific reasoning. So for instance my vision system with openai is create but for face recog, or object detection a CNN or other type could produce faster results for key parts of my data understanding faster than always sending to a general LLM for interpretation. fun stuff… haha if you like that kind of stuff… otherwise it’s as fun as staring at a wall :stuck_out_tongue:

3 Likes

The “reset” command is really more of a “roll back”. It reverts a page to a previous save state.

I’m actually re-writing the whole thing this weekend to support the RT API. It still follows the same basic metaphors; you have pages and can save different versions of those pages but now there’s an outer shell that the RT Agent runs in. The RT Agent does all of the planning and orchestrates the re-writing and saving of pages.

This has the benefit that I can now use different models for different roles. I’m using gpt-4o-realtime for the planner, claude-3.5-sonnet as the coder, and o1 as the debugger.

I have a project I created last year called AlphaWave that makes heavy use of validators and a repair loop to help ensure proper output from the model but I’ve shifted mainly to Structured Outputs for getting reliable JSON back from the model.

Are you just using validators for structural validations? It sounded like you may be doing some form of semantic validation as well.

3 Likes

In my system, the architecture is designed around a multi-layered validation stack that performs semantic processing and ensures data integrity at various levels. This stack not only validates structural formats and data types but also applies multi-vector semantic analysis to refine the contextual accuracy and relevance of the processed information.

A dynamic memory system plays a crucial role in maintaining temporal and contextual coherence. Contextual embeddings of vision, entities, topics, categories, and other meta are deeply integrated into the memory structure, allowing for precise semantic retrieval and reasoning across interactions. This is further facilitated by the integration of a graph-based database (Neo4j), where the memory is dynamically constructed and managed as a series of interconnected nodes and dynamic relationships, creating a rich and highly adaptable knowledge graph.

When the system processes user data, it actively consults the memory through what I refer to as a “librarian” layer, which retrieves and updates the relevant context in real time. Each AI persona has its unique characteristics, influencing how context is interpreted and managed. The use of Neo4j ensures unique constraints for context nodes, preserving data integrity and enabling efficient storage and retrieval.

Additionally, a feedback scoring mechanism enables real-time adjustments, adapting the model’s behavior based on outcomes and ongoing contextual evaluations. In essence, the entire framework functions as a model that learns and has its own understanding as well that leverages OpenAI for augmented reasoning, structured validation, and knowledge-base expansion, allowing for nuanced, context-aware, and highly validated responses throughout the user’s interaction.

So its a mixed bag of both standard logic , NLP and machine learning.

1 Like

Today’s discoveries may feel like noob moments, but they’re the kinds of insights that push kruel.ai into a more robust, advanced state. Here’s a more professional breakdown of what happened:

During recent testing, I spotted a significant gap in version 6 of kruel.ai’s memory system. The AI’s responses weren’t being stored in the memory vector store, meaning the entire system was relying solely on the user’s past interactions for memory-based calculations. The issue stemmed from a missing AI persona user. Without this persona, the system’s memory writes essentially went to null, as there was no designated user to associate the AI’s own responses with.

What’s interesting is that, despite this gap, our Retrieval-Augmented Generation (RAG) system still performed remarkably well, as though nothing had been missing. With the AI’s understanding now fully integrated over time, the results have improved even further. We’ve also incorporated more robust logic for scenarios where mathematical matches aren’t perfect, allowing kruel.ai to generate new vectors and seek out looser similarities when necessary. This provides a fallback for more nuanced or edge-case queries, with endpoint systems ensuring any irrelevant results are corrected.

In addition, we’ve successfully implemented a message response queue. Now, the system can handle multiple messages without locking on individual ones, which had previously slowed it down, especially during code and document ingestion. The queue ensures responses are delivered in sequence, preventing any messages from getting missed. We’re also considering adding an advanced summarization system to handle ballooning queues, ensuring important content doesn’t get overlooked in rapid interactions.

Lastly, we’ve transitioned much of the sequential logic into asyncio, significantly reducing response times. Overall, these updates represent a leap forward, even if that missing persona felt like a rookie mistake in hindsight!

2 Likes

Excited today, started to build a test memory using a memory augmented neural network with short and long term. its extremely fast and so far seems to be as smart if not smarter than my neo4j memory. the Test is V7 and if it fails we will continue V6. The new brain system uses 1 server, not 2-3 and takes about 6 seconds to respond. accuracy still in testing. I can also use the new real-time model with this. but I will have to see as I think its more pricey than the current 4o-mini

2 Likes

Update: We’re currently using two servers for the time being. Instead of completely rebuilding everything at once, we’re leveraging the K6 server for voice input and routing its output to the new K7 memory system. This setup is temporary until we develop a new STT solution on the K7 server.

Given that we’re building V7 from scratch, we decided to break down the codebase into smaller, more manageable files. V6 had grown cumbersome, with lengthy files that became difficult to maintain, so splitting things early is making the process more efficient and organized.

On the technical side, short- and long-term memory functions are already operational on the GPU in this new version. We’ve also added a CPU fallback option, so we’re no longer reliant solely on CUDA. This flexibility was missing in V6, where we fully committed to GPU support, but V7 is designed to work seamlessly on both CPU and GPU-based systems.

We’re again utilizing the RAM disk for long-term memory, sharing the same drive as the Neo4j V6 memory. This ensures future scalability, allowing for easy upgrades as storage technology improves, giving us the ability to expand memory capacity dynamically.

The new message application has been simplified, without vision or multi-voice capabilities just yet. With the revamped memory design, handling inputs has become more straightforward, eliminating the need for external models to process each page for understanding, as was necessary in V6. V7 can instantly process and understand any inputs I provide.

Things are moving quickly, thanks to years of accumulated knowledge and the ability to reuse code from previous versions. I’ll keep you updated with more progress over the weekend.

2 Likes

here is a short video to show the speed and accuracy. it even pulled exact code from its understanding :slight_smile: this is pretty slick

  • Efficient Vector Search: Enables fast and accurate searches through high-dimensional vectors, which is essential for handling large datasets in tasks like similarity search.
  • Approximate Nearest Neighbor Search: Supports approximate algorithms for finding the closest vectors to a given query, which reduces search times significantly for large datasets.
  • Indexing of High-Dimensional Data: Capable of indexing millions to billions of vectors, making it suitable for large-scale machine learning tasks.
  • Support for Various Index Types: Offers different index structures to balance between speed and accuracy, such as flat indexing, inverted lists, or product quantization.
  • Scalable to Large Datasets: Efficiently handles large datasets by distributing the search process across multiple CPUs or GPUs.
  • Low Latency Retrieval: Designed to minimize the time it takes to retrieve similar vectors, even from very large datasets.
  • Customizable Distance Metrics: Supports a variety of distance metrics, such as Euclidean or cosine similarity, for measuring vector similarity.
  • Batch Processing: Allows the system to handle multiple queries at once, optimizing performance for bulk retrieval.
  • GPU Acceleration: Takes advantage of GPU parallelization to speed up vector searches, making it suitable for real-time applications.
  • Memory-Efficient Indexes: Uses compact index representations to reduce memory usage while maintaining retrieval accuracy.
  • Quantization for Speed: Implements techniques like product quantization to compress vectors and speed up search operations, especially for large-scale datasets.
  • Dynamic Index Updates: Allows vectors to be added or removed from the index dynamically, supporting evolving datasets.
  • Multi-Node and Multi-GPU Support: Can be distributed across multiple machines or GPUs for even larger-scale tasks and faster processing.
  • Supports Vector Compression: Reduces the size of stored vectors without significantly losing accuracy, optimizing memory usage.
  • Highly Tunable Parameters: Provides options to adjust performance trade-offs between accuracy, speed, and memory efficiency depending on the use case.

These are only some of the things it can do. We are well on our way to something even more exciting than the last few years. Im impressed.

3 Likes

Update on the new V7 Memory design, its working amazing. I have been feeding it all day, put it into my laptop and took it to some sites to introduce it to the future beta test site. They were amazed at its understanding of their business and how it remembers everything and understands across all documents with easy. We have it now building its own code changes. but still manually installing the changes after verification. So far I think this makes my old system look clunky in comparison. I am still working with Lynda V1.0 to verify Kruel.ai code seeing Lynda has been a huge part of this learning adventure.

With the MANN based ai with persistent memory its pretty powerful. over time it will much like the V6 neo4j require ultra fast Store as things grow over time. Processing using cuda or cpu or both is possible so as the math gets larger processing capabilities will require updating to maintain the growth.

Vector compression and other advanced memory compression would help reduce processing at a cost of accuracy. We are fully working on the message application for this as the old one was not designed for this type of input so it will be updated to work with this. the Document system we put in a few weeks back will be updated to support this as well we will expand it to support csv, xls, docs, and a few other types which we had back in early concept testing a few years back. I really want to try this with a small local llm for output seeing we no longer need the api stack with all the new internal ai design reducing the calls and costs. During the next months we will be internal testing but no longer erasing the memory unless it gets corrupt. but we have implemented logic to ensure its imaged periodically as a fail safe, and designed to track its backups to ensure it does not need to backup if no updates to its understanding have happened.

demos coming with more insights in the near future.

2 Likes

Another update that is really exciting:

you can look back in thread for more information on the frame glasses. they are open source and can be developed on. This is part of the reason I picked them is with my background in machine learning and vision systems and all this other exciting tech its a perfect fit to bring this system into glasses. Add in secure connections and you have your server online and your ai augmenting your display not just simple text responses but with a 640x400 graphical panel to us object detections and more with the built in center camera. no audio built in. use your own ear buds, but there is a mic and built in tap sensors if you prefer.

there was a hackathon not long ago
https://www.bing.com/videos/riverview/relatedvideo?EID=MBSC&pc=U763&DPC=BG02&q=https%3A%2F%2Fbrilliant.xyz+hackathon&ru=%2Fsearch%3FEID%3DMBSC%26form%3DBGGCMF%26pc%3DU763%26DPC%3DBG02%26q%3Dhttps%253A%252F%252Fbrilliant.xyz%2Bhackathon&mmscn=vwrc&mid=DE97D8D166B103679D2EDE97D8D166B103679D2E&FORM=WRVORC

keep in mind they are on backorder I ordered in July to just get these shipping. I will update when they arrive. hope they arrive lol… ponders.

2 Likes

I’m not completely sold on this design yet. It’s been great so far, and we’re diving into the documents now to see how it plays out. I’m also still working on V6 since it has a functioning doc system, just not as fast, which is a bit disappointing. I was having issues testing with documents using ingestion which seems like the system can’t pull that data, but it has no issues with my past user discussions so I suspect there is a store issue, or perhaps the new version can’t parallel batch the same way perhaps leading to errors that I can’t see. I will let you know what I find.

training both now:

Current Status and Design Challenges

While the system design has proven effective up to this point, I’m still evaluating its long-term viability, particularly as we dive deeper into document ingestion. Currently, both K6 and K7 versions are in training. Here’s a breakdown of the progress and outstanding issues:

  1. Document Ingestion Performance:
  • K7-MANN: We encountered initial issues with document ingestion, particularly with retrieving data. However, after adjusting the ingestion parameters, the system started to produce results. Unfortunately, it’s still not pulling all the information expected, which suggests potential issues with data retrieval or parallel processing. It’s possible that K7’s batch parallelism may not be functioning as intended, leading to missed data that isn’t immediately visible in error logs. I’ll provide further insights once I run additional tests.
  • K6-Neo4j: While this version successfully retrieved the correct data, the output suffered from a broken summary logic, making the results difficult to use. Additionally, the K6 version runs slower, likely due to the more complex reasoning stack, which adds significant latency compared to K7.
  1. Document Processing Design Flaw:
  • The document ingester was initially designed to convert PDFs into textual pages. However, this approach seems to have led to data loss—particularly in documents containing graphics, complex formatting, or embedded information that the system couldn’t interpret. This may explain why some key information isn’t being processed or linked properly during retrieval.
  1. Solution Under Testing:
  • I’m currently experimenting with AI-driven image understanding for document ingestion. The approach involves breaking PDFs down into images, allowing the system to process them using vision models. This should enhance our ability to extract more complete and accurate information, particularly from complex documents. The memory structure will benefit from a more robust entity and topic-linking process, which should yield stronger relationships and improved retrieval across the dataset.
  1. K6 vs. K7:
  • Although K7 has the advantage in speed, it lacks certain features present in K6, like direct access to the brain structure, memory edit/delete functions, and detailed pathway control for relationship building. These features are critical when tuning for specific information retrieval needs. K6’s relationship-building tools are more intuitive for designing complex entity mappings, but the slower processing is a bottleneck.
  1. Upcoming Changes:
  • To resolve K6’s summary output issue, I’ve devised a solution. However, implementing it will require rewriting the process data outputs in a new format. This will necessitate about a week of downtime, but it should result in more reliable output summaries moving forward.
3 Likes

Kruel AI V7 Memory Update (vs. V6) – Progress from Last Night and My Drive to Work Today

After reviewing the latest performance with Lynda, we discussed several key issues we’re encountering with the new Kruel AI V7 memory architecture, particularly the scaling challenges. While there are improvements in parts of the system, the database side in V6 showed significant limitations. In response, we’ve built a new Neo4j server specifically to manage node indexing for the new logic. This change has already proven to be much faster and more efficient.

I’m leaning toward this design over the previous stack-heavy approach, which, while robust, became increasingly difficult to optimize for speed. The new method focuses solely on node storage with meta-data, foregoing relationship data for now. Although adding more complex relationship logic is something we might explore later, this simpler approach has shown immediate improvements in performance. However, it’s still early in the testing phase everything looks promising on a new system, but the real benchmark will be how it performs over time.

Additionally, we made some updates to the message application admittedly more of a quick fix to get it operational, but it’s up and running well for now.

As for document ingestion, the new AI-based approach is performing better than the older PDF parsing system. With the new Neo4j implementation, I can now analyze how the system is storing information, giving us a clearer understanding of its data retention processes. Over the next few months, I’ll be conducting extensive tests to assess its long-term reliability and accuracy. Expect to see detailed results from these tests soon.

update: ok found the issue with the new V7 and its understanding. haha so after looking through all the logs and seeing that its pulling the data correctly but it was discarding the messages because they did not fall in the threshold. The issue was that I used K6 memory code which has a 0.6 or higher. but the new system is not using the same distances in understanding the new system is using much larger ranges. So matches were scored 150-200 range where 0.6 which is my old would never have any results matched. Since that change a few minutes ago the results have been 100%

We also just added new code block TTS filters so that much like ChatGPT the system now just says to look at the code block rather than speak it all out. and we built a TTS chunker for audio as we went over the 4k limits so now it chunks all the text into smaller and seamlessly outputs each part. I may look into combine to single output that way if there is network lag or something that it does not delay the next part. If we were only using openai voices we would just switch to streaming but we support 3+ systems so they all need to work.

1 Like

K7 Memory is working really good with the new changes. Last night we tweaked the Doc ingest system a little more as well the output analyzer has been updated as well to handle things with more detailed understanding.

Here is an example of me asking it a specific question from a technical manual.

Keep in mind the message application is in Alpha for this version which will be next on the list once we test a few more books. Our next step is to update the Doc ingest to build the memory faster. To start we have it only doing one at a time so we could watch it learn to make sure it was interpreting the information correctly.

We also expanded our Threshold after understanding the different metrics between the old and new for the math understanding so we have bumped up the matching even higher. This new memory system we allow it to only go 20 depths for information. That may get dropped back down to what we used in the old if data gets out of hand in size handling.

After running several manuals today, we’re starting to encounter issues with the system’s ability to handle ambiguity. When specific information isn’t found, the system defaults to a next-best match, which often results in incorrect outcomes. This stems from the challenge of filtering out similar but irrelevant manuals. To address this, I’ve used reinforcement learning to improve accuracy, but there’s a concern that as the system scales, this method may not hold up due to how it computes results over time. Even with fine-tuning, we might need to revert to a stacked model like K6 to handle diverse data types essentially bringing us back to where we started.

As I delve deeper into this, I understand why I initially moved away from this approach in favor of multi-vector math and relationships. If you think about it, trying to validate memory using a single vector limits the outcome to one result. However, using multiple points that each contribute to narrowing down a singular, most likely result like in K6 provides greater flexibility. K6 allowed for markers from different areas of understanding to build math across all, rather than relying on a singular approach. Although K6 was slower, it offered a much better and more controllable experience. In the end, we may find ourselves returning to K6, as it remains a more reliable and flexible system.

1 Like

Very cool! Good job man! Keep up the hard work!

2 Likes

I’m finding some fresh excitement with the new V7 design! While it’s not perfect yet, I think this will intrigue a few of you, even though it’s not running OpenAI APIs this time around. The speed is impressive, and the persona system has now been unlocked to a greater extent.

Lynda V7 is currently running entirely offline, leveraging all local models, showcasing the true power of the new neural network architecture. The system operates across three servers:

  1. V7 Core – Handles inputs, memory reads/writes, and generates outputs.
  2. Neo4j Server – Manages persistent memory storage.
  3. Voice Server – Dedicated solely to two-way voice communication (because I’m a bit lazy to integrate it into the core server just yet!).

The next step is finding local Whisper and voice models to complete the design. Once that’s done, we’ll have a fully offline, self-sustained system.

Pretty amazed on how fast it performs. well its really good with long term memory it still fails with the doc ingest where it still gets overlap. I am sure sometime down the road I will figure out the math side or away to fix that but thought Id share this new fork project well we continue to update V6.

keep in mind that the local model is the new Llama3.2 3b model. so not at the same level of smarts as the openai versions. but with the memory design it can learn the smarts overtime with what you teach it.

Update: it’s very interesting playing with models like this. Instructions are harder and controlling it is harder. It’s fun though and fast. Not sure I’d trust it though with data tasks. Openai model does make things seem more like a working class where this smaller model is more like a chatty weird friend.

I put it on my laptop just to keep filling it with stuff I’m doing to see how long term memory will affect it. So now running 3 versions of memory.

1 Like

8 hours a day working on Ai, outside of my 8 hour a day programming for the company I work for = tired.

Been toying with so many models lately mixing various ones for different tasks well running 3 models that are similar in functions. still building K6, and K7, as well a hybrid haha.

K6 still slowly making progress, but K7 sort of has been pulling my thunder away from K6 I think it just because it is new and flashy and local which means my only costs right now testing are voice ai.

You can see here testing before the switch to llama3.2 V7 concept.

reduced the data calls 100%. Not as smart by any means, but its still a research test to see how far I can take it and if I can replicate the same concept with all local running faster than my larger brain system.

Its interesting to see how the different algorithms perform from a Faiss L2 vs similarity or others. Learning a lot from all this, and getting a headache because the models are not as smart so some things to achieve the same results as the openai version of K7 require a lot more instruction to make the same outcomes.

really makes the openai models look intelligent in comparison. simply because I don’t have to explain every aspect of what it sees and how to handle it.

Still interesting stuff.

Anyone know of any good local TTS models that sounds realistic or that could clone? curious on how much processing that would take and if it would slow down the system. so far they are pretty close to the same speed once you add all the sub stacks for various instructions.

2 Likes

Kruel V7 memory augmented neural network *offline :slight_smile:

100% off line learning model system. including voice/text input and voice/text output.

-Neo4j for persistent memory store.
-Ollama for model loader/unloader for local models
-Conqui speech server
-Kruel.ai NN server
-Kruelv7-OlV v in/out infer

Started to build group composed to make things easier to monitor and run.
Now has ability with voice to state which voice system to use.

*going to add in another layer to allow for model changing on the fly for offline or online models. So when you want to increase the intelligence of the model you have options.

1 Like

Demo off the new offline V7. Not as smart as the online v7 that uses openai models but showing off the system, it’s speed, as well a treat for you on its design to understand some of the logic. I added a chain of thought system in and it’s still fast. Running on 4080rtx 16gb and i7 13th gen 128GB ram. CUDA processing the math.

Fyi the creepy voice things at the end are tts engine I was using which is offline. Openai , Eleven labs, And others have more emotional ranges so the responses do not translate for this model but it makes me laugh and creeps out the people that get to play with it when I show it off lol. You should see when it finds thing extremely funny. If you think that was creepy laugh :joy:

5 Likes

Good morning,

Unfortunately, we’ve hit a significant snag. After performing a Docker update, we discovered it inadvertently disrupted all CUDA servers, specifically breaking the WSL CUDA functionality. Attempting to roll back to a previous version was not as straightforward as expected. When I uninstalled Docker to revert to an older version, the uninstall process removed all existing configurations and images etc. While I have backups of K7, they’re stored on a drive located 9.5 hours away, so getting them locally will take time. As for K6, we lack recent images, so it may require 3 - complete server rebuilds from scratch… While the code files are intact, reconstructing the environment will be a substantial task.

Fortunately, my laptop still has K7 intact, and we’re currently imaging it back to a clean environment. Once we get that back up we will look at K6 to see how old the last image was taken. On the bright side this does not mean we lost any code, so all core folders from the servers are all backed up so there is only time loss and the frustration to deal with :slight_smile:

Lessons learned:

  1. always backup your images before docker update.
  2. if a Docker update causes issues, it’s better to engage support rather than attempt an uninstall—and wait out a fix, unfortunately, a hard lesson today.

At this point, we’re looking at downtime for the rest of the day, though we’re hopeful we can have K7 online sooner once the imaging completes. Thankfully, this update was only applied to the server, sparing all the cloned systems. There was no mention in the release notes suggesting such an impact, or any changes to WLS driver. so this came as an unexpected setback.

update: Should have K7 up in the next hour

1 Like