[VIDEO] Implementing Core Objective Functions for safe, benevolent, and trustworthy AGI

New video is up! I have achieved early success with my Core Objective Functions (COF) fine-tuning project.

The purpose of this is to create safe, trustworthy, and benevolent AGI. In this video, I describe the COF, how they are implemented, and how they will result in benevolent AGI. Furthermore, I give a demonstration!

How can you use this?

If you’re trying to make an open-ended chatbot, the COF can help with high-stakes cases. For instance, the COF will intrinsically disagree with destructive desires, such as building or acquiring weapons.

This initial dataset is somewhat limited though, and it needs a lot of help. I will be continuing to improve the dataset over time so that it will be more reliable and more useful.

The way you integrate this model is to incorporate the output of the COF model into the chatbot’s internal thoughts. I call this the “corpus” in my book. By adding the COF to the corpus, the chatbot (or AGI) will be pushed towards prosocial behavior and decisions. Then, over time, as more good data is added to the training data, the prosocial behavior will be better and more reliable.

Here’s my book: https://www.davidkshapiro.com/nlca
Here’s the repo: GitHub - daveshap/CoreObjectiveFunctions: The Core Objective Functions are the solution to the Control Problem. They will result in a benevolent and trustworthy AGI.

2 Likes