For those interested in UAVs (aka autonomous vehicles, aka drones), the video above is a demonstration of an OpenAI Assistant controlling an ArduPilot quadcopter. The Assistant is integrated into our MAVProxy’s ground station’s new Chat module. This allows an OpenAI ChatGPT-4 Assistant to control an ArduPilot vehicle using MAVLink.
The ArduPilot / MAVProxy side of this is all open source and the code can be found here. This module is written in Python and uses the OpenAI’s Python Assistant API.
The way it works is that we’ve created a custom Assistant (using gpt-4-1106-preview) running (of course) on the OpenAI servers. The MAVProxy chat window has a record button and text input field which allows the user to communicate with the Assistant and ask questions or give it commands (e.g. “takeoff to 3m”). The Assistant has been given instructions on how to respond to user queries and requests.
The Assistant has also been extended with a number of custom functions. The full list of functions can be seen in this directory (see the .json files only) and includes general purpose functions like get_current_datetime and also ArduPilot specific functions like get_all_parameters and send_mavlink_command_int. In most cases, after the user asks a question or gives a command, the Assistant calls one of these functions. The MAVProxy Chat module is responsible for replying and does this by providing the latest mavlink data received from the vehicle or by sending a mavlink command to the vehicle.
The module is still quite new so its functionality is limited but some things I’ve noticed during its development:
- The Assistant is somewhat unpredictable. For example in the video you’ll see it initially fails to make the vehicle takeoff but succeeds on the 2nd attempt. Adding functions often helps its reliability. For example the get_location_plus_offset improved its reliability in responding to requests like “Fly North 100m” because it no longer had to do the conversion from meters to lat,lon.
- The OpenAI API is a bit laggy. It can take several seconds some times for it to respond to user input. Hopefully this will improve in the future.
- On the plus side, when a new function is added, the Assistant often makes immediate use of it even if we don’t give it specific instructions on when it could be useful. A good example of this was after we added the wakeup timer functions which allow the Assistant to set an alarm/reminder for itself to do something in the future. Once added the Assistant immediately started using it whenever it was given a series of commands (e.g. “takeoff to 3m, fly North 100m, then RTL”).
- The cost of using the Assistant is reasonable for a single user (I averaged $6 USD per day during the development of this model) but would be too high for AP to simply absorb for all our users so we will need to find a way to pass on the costs. Alternatively scripts have been provided so users or organisations can setup the assistant using their own OpenAI accounts.
There’s still a lot more work to do and hopefully we can make a similar feature available in the other GCSs (Mission Planner, QGroundControl) in the future.
If you’re interested in getting involved we have an “ai” channel on ArduPilot’s Discord server and here is the overall issues and enhancements list we’re planning.
Perhaps the three biggest enhancements I’m looking forward to include:
- Improving the Rec button so that it can constantly listen (currently it simply records for 5 seconds)
- Move to using ArduPilot’s Lua interface instead of MAVLink. This will provide even better control of the vehicle.
- Support sending the drone’s camera gimbal images to the Assistant
- Moving the chat module to run on the drone itself
Thanks very much to OpenAI for the easy to use Assistant API!
Looks very cool!
I’ve moved the thread to Community and tagged it project … We’d love to hear about updates here in the thread as you make them…
Hah! I remember ArduPilot! Company I was involved with was a sponsor… I seem to remember chatting to someone called Trige? Trignel? something like that. Excellent to see it getting hooked up to ChatGPT.
On your “laggy” comment, have you tried streaming mode? usually get a response back under 1000ms.
On the cost side of things, you will be able to control the size of the retrieval context soon, but I’m not sure where the lions share of your token usage is coming from.
Really incredible, you’re living my dream.
Back to post this, if you were able to mix this setup with Neurosity, you’d truly be innovating. If you have the money to spend on this and you were able to get it working with your current setup, it would be truly incredible. The best part is i’m pretty sure you could easily integrate this with what you already have. You probably wouldn’t even need to change any code.
This an amazing project, absolutely love it!
I’ve got a couple of questions:
- what flight controller is this running on?
- What would be the reasoning for this:
- have you considered adding TTS?
- what about assistants for mission planning?
Great, yes you probably spoke with Andrew “Tridge” Tridgell who is our systems lead and Plane “maintainer”.
Re streaming mode, I’m very much looking forward to it but I don’t think it’s available yet through the Assistant API (related discussion is here). As soon as that’s available I will certainly make use of it!
Txs for the advice and update on costs. I’ll look into coming up with a breakdown of token usage by function, etc. I think it is probably mostly coming from when the assistant consumes our 1400 flight code parameters :-).
Thanks very much for the positive feedback! Adding this feature was really fun and interesting.
I used a CubePilot CubeOrange but any of our supported autopilots will work.
Re running “moving the chat module to run on the drone”, I should have been more clear that I’d like to also allow the autopilot to directly communicate with an OpenAI’s Assistant for some tasks without involving a ground station. In particular I’d like to try adding a “safe landing” feature where the autopilot would send an image taken from the drone’s downward facing camera gimbal and ask the Assistant where the safest place to land is.
I’ve definitely considered adding text-to-speech and MAVProxy actually already has this but it was written many years ago and sounds very robotic. I’ve hesitated to use OpenAI’s TTS because of the lag… I want that natural sounding speech though so hopefully eventually!
Yes, we’re planning on supporting mission planning! I’m debating whether to next focus on this or vehicle setup (e.g. sensor calibration, etc). I suspect we will do both, it’s just a priority question I think.
Thanks again for the feedback!
Good idea with the safe landing feature, but doing so through API calls might not be the best idea, as conditions on the ground can change rapidly in unpredictable ways. (Think vehicle and people avoidance)
I only need to setup my vehicle once, but I need to plan a new mission every time I need to fly, so my vote definitely goes to mission planning.
Yes, that’s a good point about mission planning… OK!