With OpenAI, Figure 01 can now have full conversations with people -OpenAI models provide high-level visual and language intelligence -Figure neural networks deliver fast, low-level, dexterous robot actions Everything in this video is a neural network:
This is pretty cool, but I don’t think the base AI isn’t anything different from what has been released as an API. The stutter/filler words serve to allow for quicker responses by chunking the TTS like model output. OpenAI’s current TTS sends only file output which I assume is a limitation of the model. I’ve overcome this in the past by chunking it at the most natural punctuation. The filler words may be some very light LLM like model that contextualizes the filler words or just random filler word selection (although less clean). The communication goes through a Whisper Like Model->Base LLM-> TTS like model chain. To control the movements, they are probably using an LLM agent like framework that can specifically task plan and access the different parts of the robot as tools. Pretty cool
If I was to build a robot it would not be completely GPT driven.
I would have an automation controller working with GPT so that the sensor systems and automation for basic movements are real-time but using AI processing along with understanding to build a system where the GPT can instruct the automation for various tasks. Biggest issue with GPT is time delay which in anything moving needs to see in real-time to react in real-time. you can try to build complex math for predictive movements which well can be tuned pretty good I am not convinced that its fast enough to ensure safety around it making it limited in application which is still ok if that is what you are aiming for.
now if that is a real product it is impressive and look forward to ai speeds that can handle real-time
PLEASE do a demo video of asking it to do random things such as “lay that cup on it’s side” or “drop the plate mid-hold.” That would be incredible! I think we all might like to see it attempt to perform tasks that are unconventional so that we may gain further insights to it’s true capabilities.
This is very cool. Can we NOT make them out of metal though? You know, just in case of a robot uprising. Maybe make them out of recycled plastic, it’ll keep costs down, and it will be good for the planet?
Here’s the image of a robot designed to be as safe and non-threatening as possible, made entirely out of bubble-wrap. It stands in a friendly pose, suggesting openness and helpfulness, with a welcoming expression on its face.
I would have expected that 1x.tech would be the one to demonstrate this level of capability first, given that OpenAI and Microsoft has put a substantial amount of money into that company.
There is a big fight for funding and talent, so it matters greatly who is able to publicly demonstrate proper progress.
Developmental rubicon… interesting choice of words
I agree, to me the training across many actuators working in concert, is more impressive than the voice and vision integration.
As for humans, I won’t be so sad to see humans go, I don’t think our brains were designed for this kind of society. But what do you think?
I’ve had some experience with robotics and kinematics. A lot of wow factor here. One thing I noticed is the way the set was pre-staged. It might have been even more effective if we had seen the human demonstrator place the objects in front of the robot by sliding them over to it. This would dispel the notion that the movements were all pre-calculated.