SynthOS: An interactive shell powered by GenAI

Sharing a link to install ShellE. An interactive Shell Experience powered by Gen AI.

ShellE is a refocusing of the AgentM Pulse app I’ve been sharing screenshots of. Where Pulse was an open-ended experiment in creating a completely AI generated UI, ShellE is squarely focused on using GenAI to improve developer productivity. The primary addition over Pulse is the ability to run arbitrary terminal commands.

ShellE lets you define scripts that become tools that a model can generate code to call. Here’s an example of me asking ShellE to list the files in the current directory:

I just asked for a list of files in the current directory and the model wrote this code to run the terminal command that lists the files, parse the results, and display it in ShellE’s viewer.

        window.onload = function() {
            const chatMessages = document.getElementById('chatMessages');
            chatMessages.scrollTo({
                top: chatMessages.scrollHeight,
                behavior: 'smooth'
            });

            // Execute the Windows terminal command to get the current directory
            fetch('/api/scripts/windows-terminal', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify({ command: 'cd' }),
            })
            .then(response => response.text())
            .then(data => {
                document.getElementById('currentDirectory').textContent = data.trim();
            })
            .catch(error => {
                console.error('Error:', error);
                document.getElementById('currentDirectory').textContent = 'Error fetching current directory';
            });

            // Execute the Windows terminal command to list files
            fetch('/api/scripts/windows-terminal', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify({ command: 'dir /b /o:gn' }),
            })
            .then(response => response.text())
            .then(data => {
                const files = data.trim().split('\n');
                const fileListBody = document.querySelector('#fileList tbody');
                fileListBody.innerHTML = ''; // Clear existing content

                files.forEach(file => {
                    fetch('/api/scripts/windows-terminal', {
                        method: 'POST',
                        headers: {
                            'Content-Type': 'application/json',
                        },
                        body: JSON.stringify({ command: `for %I in ("${file}") do @echo %~nxI^|%~zI^|%~tI^|%~aI` }),
                    })
                    .then(response => response.text())
                    .then(fileInfo => {
                        const [name, size, lastModified, attributes] = fileInfo.trim().split('|');
                        const row = document.createElement('tr');
                        row.innerHTML = `
                            <td>${name}</td>
                            <td>${attributes.includes('D') ? 'Directory' : 'File'}</td>
                            <td>${attributes.includes('D') ? '-' : formatFileSize(parseInt(size))}</td>
                            <td>${lastModified}</td>
                        `;
                        fileListBody.appendChild(row);
                    })
                    .catch(error => console.error('Error fetching file info:', error));
                });
            })
            .catch(error => {
                console.error('Error:', error);
                document.getElementById('fileList').innerHTML = 'Error fetching file list';
            });
        };

        function formatFileSize(bytes) {
            if (bytes === 0) return '0 Bytes';
            const k = 1024;
            const sizes = ['Bytes', 'KB', 'MB', 'GB', 'TB'];
            const i = Math.floor(Math.log(bytes) / Math.log(k));
            return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
        }

Instead of writing a bash script you can just ShellE what you need to do and it will essentially write the script for you and run it on the fly. It can also build whole UI’s with buttons and input boxes that translate to calling terminal commands or executing other JavaScript programs…

It’s radically changing the way I work so thought I’d share…

3 Likes

I’m in the process of updating ShellE SynthOS to support the new RT API… The addition of an agent that can plan and help drive the experience is super interesting.

Very much a work in progress but thought I’d share a visualization SynthOS just created to show me the molecular bonds of a compound.


We got to this from me asking for a visualization of the periodic table. I asked it to start highlighting the elements in various compounds and then when I asked to see the structure of the compounds it jumped into full presentation mode like it was giving me a chemistry lesson.

2 Likes

Thanks for the update. Looking slick!

I wish I had more time! I almost took up your suggestion to submit code for Openrouter.ai usage, but alas…

Keep up the great work!

I was thinking tonight… We’re a developer community, but more and more people are going to become “developers” soon thanks to AI…

Why places like our community garden are so special (and important), I think!

I’ll get that added for you. I re-worked the settings screen last night (ok Claude did) to support configuring all of the various agents I use so I added the custom endpoint field that’s needed:

If you look at the world of Star Trek everyone is a programmer. They just don’t write code. A lot of people are focused on using AI to help non-developers write code and I don’t think that’s what we want at all. We want non-developers to be able to create programs without ever having to see a line of code.

The current gen of models aren’t quite there but we’re only 1 or 2 generations of models away from models that are capable of that. So why spend a ton of time around building out tech that’s going to be obsolete in 24 months. Skate to where the puck is going to be… Not where it is…

1 Like

Yeah, why from the start I’ve tried to build out LitRPG Adventures around a huge library of non-commercial content moreso than a tool where you create your own art. It’s still floundering (maybe because of my lack of focus on revenue at any cost), but long term, I have hope of becoming the Netflix of D&D material…

I think we’re on the same holo-deck here, my friend! :wink:

No rush but appreciated. I’m sure others will too. I’m surprised there aren’t more services like openrouter.ai… or maybe there are and I just don’t know of them? I haven’t looked…

1 Like

I would say the issue is you need to train a really good classifier and you need to see A LOT of requests to do that. Not many services are in a position to do that and the ones that are, OpenAI & Anthropic, have no motivation to do so.

I of course have my fingers crossed for you :slight_smile:

1 Like

SynthOS teaching me chemistry… The planning and orchestration are all managed by gpt-40-realtime and the animations are all coded by o1-mini. The pacing through the lesson is a little fast at times but I think I can fix that with prompting…

1 Like

I hope that’s in milliseconds heh…

We’re looking to highlight internal posts in upcoming AI Pulse editions (maybe!), so please keep the great stuff coming.

I had an idea today. Maybe SynthOS could help?

Basically…

  1. Take RTS game logs
  2. “NotebookLM” them to get a play by play recap
  3. tie that to the replay of the game
  4. ???
  5. Profit?

The hard part, I think, would be getting the “camera” of the replay to match what the “podcasters” are talking about in regards to the game data. Transferring all the “moves” to narrative shouldn’t be that hard?

But a multi-step process which SynthOS excels at?

Are you doing audio at all or music generation? I’ve got a few ideas in that direction that could use automating too. Doing sentiment analysis on text then changing a live stream of generated audio/music…

Lol… I didn’t even notice that… If you haven’t watched the video yet it’s under 5 minutes and it’s mind blowing stuff…

watch the video :slight_smile:

1 Like

That is amazing! Stunning demo :+1:

Impressive!

Not sure about the atoms sliding against eachother visualisation though … but I"m being picky - great orchestration!

So the first 10 times I ran it the actual visualizations were better but I literally spent $23 debugging why I couldn’t record both my mic and the desktop audio at the same time. I took the cut I got…

I pretty much know how to all but eliminate the render times between screens and I have a number of ideas for how to improve the quality and consistency of the code I’m getting back from o1. I also gave the model a pause tool so I’m starting to get the pacing dialed in. This will all improve.

And that’s just it giving presentations. It can write any code and play games with you as well. Its ability to drive presentations caught me off guard so I wanted to share that first.

2 Likes

yeah, you are heavily dependent on the model output.

great work!

1 Like

Totally dependent but the model can be nudged in the direction of generating better outputs. The part I wasn’t sure was going to work was the glue between the planning agent and the coding agent but that’s working better than I had expected. The prompts that gpt-4o-realtime sends to o1 are actually really good. I just need to work with o1 a bit to help it do a better job of following those prompts.

I know how to get better animations and code out of o1. That’s just pattern matching. I’m using zero shot prompts and I need to build up a library of examples for it to use as inspiration.

1 Like

The beauty of your solution over even Canvas is that your system is hosting the resulting code and running it seamlessly in contrast to Canvas where you appear to have to copy and paste it into an editor, save and refresh. :+1:

1 Like

SynthOS generating a full presentation on the fly:

Ok so look past the fact the slides get out of sync with the narration. That’s just a bug on my end. The slides and audio generate faster than I can play back the narration. I’ll have that fixed today.

This is a glimpse into the future of working with computers .

2 Likes

Yes, very likely. Another great outing for your tool.

Aside and no reflection on your great framework:

If a staff member presented such a pack to me that was generated with too little of their own input and looked like something out of ChatGPT, I’d not be very happy with them at all! They’d need to demonstrate understanding on how every recommendation was arrived at, how each figure was created, able to justify every one from first principles. Company policy/strategy from ChatGPT … shudder!

1 Like

Yes they all need to be grounded… it’s a good thing I’m an expert in grounding :slight_smile:

1 Like

This is a difficult one so I defer to GI and OpenAI to balance this…

Actually, Steve, this is a philosophical point where we slightly diverge.

While I agree that we want non-developers to easily create programs, I believe it’s equally important for them to appreciate the underlying beauty of what’s being created, even if they don’t directly interact with code.

I’m truly in awe of the creative speed and depth you’ve achieved in this field. I pride myself on being a relatively fast thinker, but your work continues to push the boundaries of what’s possible.

I often reflect on the phrase “Intelligence Breeds Intelligence,” and it’s clear that you’ve surrounded yourself with incredibly smart individuals, building on your own strong foundation. Your ability to create at this level is remarkable, and I certainly don’t want to slow you down with this comment.

I also admire how Converso extends your ideas about interfacing. However, I’d like to introduce a challenge to this vision with my own interface concept, ‘Phas.’ I believe there is room for an even more nuanced approach to emerge in the near future.

Interestingly, I’ve had my son work with SynthOS, and while it’s impressive, he found it more challenging compared to Phas or Canvas for creating 3D games. He was able to navigate those platforms with greater ease, which might indicate that different tools cater better to different user needs, even for young learners.

These are observations made with respect, knowing that I’ve already struggled to match your pace in current software architecture. Nonetheless, I believe there’s an exciting evolution ahead in this space, and I’m eager to see where it leads.

Back to my own voice, in conclusion:

I try to remove interface in my designs, that is what Phas is… An interface devoid of interface.

My designs reflect a world I remember where Microsoft would make you click 50 time for 1 task.

I think interface is a design to task issue, I am not clear that it is an ‘AI design to task issue’

So a voice/text only interface certainly has its limits. You’re son’s using the text only version of SynthOS and the voice powered version lets you move 10x faster but even still, there are limits.

We don’t navigate around the world just using our voice to get things done, we use our hands and manipulate things. The right interface is a hybrid interface where you can work much like you do today but when you want to do something complex you point at it with a mouse and use a voice command to manipulate it.

The other thing is that these models are just barely good enough to make any of these scenarios work. They will get better and you’ll be able to move even faster with SynthOS. This don’t think we will ever get to a point to where we can expect the model to output a rich complex game, that matches the vision we have in our head, in a single turn. There’s just not enough bandwidth in language for that.

None of the recent ideas I’ve posted to this forums, like SynthOS and Convo, are remotely close to what I’d consider being in their final form. They’re just experiments that hopefully give a glimpse for where we can take this technology.

I appreciate the kind words and hope to keep pushing the bounds of this stuff because that’s just the way I’m wired. I was one of the first internal users of Microsoft Azure. My hello world app grew into one of the largest early services to run on Azure. It consumed 1,500 compute nodes and at one point was 2/3s of all processing on Azure. The Azure guys loved it because it pushed the platform in directions they hadn’t thought of. I’m just doing the same thing here.

1 Like