Putting it all together in Robotics

Anyone who can build this, please feel free.

You will need:
GPT-3 API-key
Boston Dynamics or alternative robot
YOLO (image to text) network for object recognition
Speech to Text network
Text to Speech network
Raspberry Pi or other portable computer
small Microphone, Speaker, and Video Camera

Maintain in the GPT-3 text prompt memory as much as will fit and then start deleting the oldest stuff to make more room.

Have the Raspberry Pi listen and watch the room with the speech to text network and YOLO networks running locally. Using GPT-3 wirelessly online.

So GPT-3 can see something like:

object “TV”
object “table”
human voice 7 said “hey, it’s the robot”
human voice 9 said “yeah it’s good isn’t it”
human voice 7 said “I agree, it is”
object “carpet”

And has the option to request the text to speech network with “say”, or request some basic movement with “move”, updating the GPT-3 prompt as it goes, for example:

say “yes I am a robot”
move a unit forward
turn 90 degrees left
move a unit forward

There’s a phycological idea that us humans are (organic) prediction machines, and those predictions (of the near future) include what we will do (as well as what others will do), and so we then do those things.

GPT-3 is all about completion! Even if it’s not the best tool for the job it should still be great at this.

Beyond this an immutable part at the beginning of the prompt could instruct it to rework the whole prompt sometimes compressing it.

So if the whole prompt looked like:

say “hmm I’m not sure, I’ll have a think”
object “cat”
human voice 12 “we just need to do X”
compress “say not sure, now I know it’s X [new line] object (cat)”

Then it changes to:

say: not sure, now I know it’s X
object “cat”

Another improvement could be to use a series of other documents that it could search to look things up, based as usual in instructions in the immutable part of the prompt.

search “yesterday”
found “saw cat, worked something out having thought it through, humans were pleased”

Put all this together in an open-ended loop and you’ve got a robot with agency

I cannot but please feel free to try, it’s more important that it gets done than that I do it. Even if it goes off the rails really quickly, it’s still worth knowing.