Claude Computer-Use - Do you think OpenAI has something similar

I tested Anthropic’s (Claude) computer-use Demo last week and it is pretty interesting. Its limited, slow, and buggy but you can do some pretty cools things still.

Has anybody seen any news or videos hinting that openai but have something similar?

It is a really interesting concept and way to automate tasks or research. I know they have mentioned an o1 model that could think for weeks, months, etc… but I do not see how this would be useful without a feedback loop (google searches, data extraction, analysis) adding new data or context.

I was thinking if you could train a model on a series of videos, where you are performing a specific task, using an existing app (photoshop, google sheets, notion) and finetune it to perform the task the way you prefer, that would be extremely useful…

It could get to the point where you have an AI worker, running in a container, performing marketing, research, analysis, and other tasks. It is pretty crazy to think about.

1 Like

Regarding your last sentence - I did exactly this with just API calls - essentially different “agents” under the hood - one collecting market dynamics for a target company, another creating a cultural assessment, another collecting M&A news, etc (the context is private investment). There was no need for anything like what Anthropic demonstrated - ChatCompletions API with GPT-4o and feeding it right data sources to query and crawl was more than enough.

I don’t see why it wouldn’t be possible to swap out Claude and just use GPT-4o or o1 family of models. My feeling is that people are still trying to understand a very concrete use case, beyond just a cool demo. RPA has been around for a very long time now.

2 Likes

That sounds great. I am more thinking of scenarios that are bit more complex such as using a specific app, like Canva, and getting it to create an infographic or slideshow. Or some type of multi-tool workflow where you use different tools that may not have built-in integrations.

For example, “help me verify these engineering calculations, then open AutoCAD and create an engineering diagram for {x} part”…

Yes, I’m sure there are some use cases out there like that. What immediately comes to my mind is accessibility use cases - for people that may have problems with mobility this would be super useful.

1 Like

Ya especially when combined with voice control and/or voice feedback. That is another great use case.