been working on this OSS lib to control your computer using OS level API instead of vision/pixels which is much faster, cheaper, and more reliable:
would love to hear your thoughts, any feedback or ideas?
been working on this OSS lib to control your computer using OS level API instead of vision/pixels which is much faster, cheaper, and more reliable:
would love to hear your thoughts, any feedback or ideas?
What’s the scope of this? I don’t expect it to work with electron apps, for example?
I imagine you’re using the accessibility and .NET UI automation layers?
Very cool. Was thinking about doing this for WinAppDriver.
It works with everything. Our approach is to provide tools to LLM and prioritize the UIAutomation layer (windows API) in the system prompt. AI can also fallback to OCR (we also use native local Windows OCR API(, vision, if it fails to rely purely on UIAutomation.
Why .NET, such a dinosaur language lol. We use pure Rust and TS for the AI stack since TS is the best programming language for production AI.
UIAutomation works for 99% of things though, and much faster and reliable.