OpenAI ChatGPT + Robot = Figure 01



With OpenAI, Figure 01 can now have full conversations with people -OpenAI models provide high-level visual and language intelligence -Figure neural networks deliver fast, low-level, dexterous robot actions Everything in this video is a neural network:


Programmed to stutter

nice :thinking:

I think :laughing:

1 Like

I noticed that! Very uncanny valley!

It was a very “natural” sounding voice… Just all the pieces coming together…


the renders look nice, what was used to generate the videos?

1 Like

Which ones? Not sure…

Pause words, sneaky way to buy it a little more, um, what do you call it, uh, inference time :laughing:

And I wonder if that gravely voice is a way to disguise a speech generation model optimized for speed over quality

1 Like

Heh… stream (Uh-huh) stream …

I’m hearing a lot of non-tech people saying the voice is faked…

Another take on YT…

1 Like

This is pretty cool, but I don’t think the base AI isn’t anything different from what has been released as an API. The stutter/filler words serve to allow for quicker responses by chunking the TTS like model output. OpenAI’s current TTS sends only file output which I assume is a limitation of the model. I’ve overcome this in the past by chunking it at the most natural punctuation. The filler words may be some very light LLM like model that contextualizes the filler words or just random filler word selection (although less clean). The communication goes through a Whisper Like Model->Base LLM-> TTS like model chain. To control the movements, they are probably using an LLM agent like framework that can specifically task plan and access the different parts of the robot as tools. Pretty cool

It does look like we’re living in an age where first impressions pummel logic to death.

1 Like

It sounds just like the voice from the ChatGPT App?

I’m questioning if the movements were pre-programmed at all or if the LLM genuinely instructed it on what to do…

If I was to build a robot it would not be completely GPT driven.
I would have an automation controller working with GPT so that the sensor systems and automation for basic movements are real-time but using AI processing along with understanding to build a system where the GPT can instruct the automation for various tasks. Biggest issue with GPT is time delay which in anything moving needs to see in real-time to react in real-time. you can try to build complex math for predictive movements which well can be tuned pretty good I am not convinced that its fast enough to ensure safety around it making it limited in application which is still ok if that is what you are aiming for.

now if that is a real product it is impressive and look forward to ai speeds that can handle real-time :slight_smile:

PLEASE do a demo video of asking it to do random things such as “lay that cup on it’s side” or “drop the plate mid-hold.” That would be incredible! I think we all might like to see it attempt to perform tasks that are unconventional so that we may gain further insights to it’s true capabilities.

This is very cool. Can we NOT make them out of metal though? You know, just in case of a robot uprising. Maybe make them out of recycled plastic, it’ll keep costs down, and it will be good for the planet?

Here’s the image of a robot designed to be as safe and non-threatening as possible, made entirely out of bubble-wrap. It stands in a friendly pose, suggesting openness and helpfulness, with a welcoming expression on its face.


Perfect. It can also be easily shipped.


Welcome to the site, by the way.

Hope you stick around. We’ve got a great AI dev community growing here.

1 Like

I would have expected that would be the one to demonstrate this level of capability first, given that OpenAI and Microsoft has put a substantial amount of money into that company.
There is a big fight for funding and talent, so it matters greatly who is able to publicly demonstrate proper progress.

The neural autonomous learning is amazing and the dexterity feels like it’s crossed a developmental rubicon. The question is what becomes of humans…?

Developmental rubicon… interesting choice of words :sweat_smile:
I agree, to me the training across many actuators working in concert, is more impressive than the voice and vision integration.
As for humans, I won’t be so sad to see humans go, I don’t think our brains were designed for this kind of society. But what do you think?

I’ve had some experience with robotics and kinematics. A lot of wow factor here. One thing I noticed is the way the set was pre-staged. It might have been even more effective if we had seen the human demonstrator place the objects in front of the robot by sliding them over to it. This would dispel the notion that the movements were all pre-calculated.