Letting OpenAI GPT-4o mini control my phone

Since Claude Computer Use was pretty known, thought of using GPT 4o/mini to try and control my phone. Got great results.

Some of the things i tried - “Draft a gmail to a friend asking for lunch” or “Start a 3+2 game on lichess” or “Find the bus stops in Alanson” or “Find my rating in uber app” or “Order or ” etc.

Decided to make it open source (demo included): https://github.com/BandarLabs/clickclickclick

The Planner (which tells the next step) works great with GPT 4o/mini. The Finder (which finds elements on the screen) didn’t work as good with GPT 4o - so used Gemini / Molmo.

The cost is approximately 10x lower than Claude’s computer use.

Planner (via GPT 4o) uses vision + previous actions to plan next step using tool calling.

3 Likes