tl;dr I wrote a blog post about having an “incentive to label” and wanted to re-share here.
I’m not sure if cross-posting a personal blog is “kosher” but am happy to take down this topic if it causes an issue.
Last April I wrote a blog post titled An Incentive To Label [1]. It is the result of work I did at Magic Leap & Waymo prior to moving into Web3; I know someone must be rolling their eyes but please keep reading.
The topic of AI Alignment has subsided from mainstream day-to-day but I still keep thinking about how:
Sometimes I, personally, do not align with myself when I know I should do something but I don’t.
Alignment is hard in small teams, let alone larger orgs requires strong leadership.
If we scale to nations or the entire world, the difficulty scales exponentially.
Getting alignment within a specific context requires VERY HIGH QUALITY data
The blog post I wrote covers an approach to:
Create public data sets in a decentralized/permissionless fashion by putting incentives in place.
Set a foundation to democratic inputs to AI that have “alignment” within specific contexts.
I have no immediate asks or agenda, but simply wanted to share some of my thinking/work in case anyone has been interested to work on a similar project.
My approach to this is build moreso around experience & intuition rather than a data-driven evaluative approach.
GPT4 is my go to LLM for my daily work because “it works”. I recognize it’s at the top of the leader boards, but it also passes my personal “vibes” check, and I believe it’s a result of the RLHF data being tuned to how I think & prompt LLMs.
When I was doing eval for adversarial planner evaluation, we didn’t need “big data” as much as we needed a very “high quality golden data set” against which we could do eval.
I have a gut feeling that the dataset OpenAI used in the RLHF fine-tuning step wasn’t retrieved through mechanical turk, but was likely a highly curated list from focused individuals.
I guess it depends on what you define as “extremely high quality” - because quality is unfortunately subjective.
My concern is that “extremely high quality” becomes “aligned with our corporate/ideological/whatever goals” - I’d instead hope for neutral and unopinionated training sets, wherefrom the capacity to adapt to any given scenario emerges naturally.
The idea between the design I proposed in the blog post is that there’s an incentive to create many different fine-tuning data sets depending on the model’s/architects need.