By compelling the developers of said language model to disclose their training data via court order.
Perhaps, but LLMs are slippery. The larger, more complex, and more powerful they become the more difficult it is to keep them playing within the boundaries.
It would be good for possibly reducing the number of lawsuits OpenAI will need to respond to.
That said, I think the authors are going to have an uphill battle here, because the genie is out of the bottle.
The cost of a unit of computation decreases by an order of magnitude every 3–4 years. That’s a 1000-fold reduction in training cost over a decade. So, if GPT-4 cost ~$1B to train today, we would expect it to cost ~$1M to train in 2033 and ~$1k to train in 2043.
That’s based solely on advances in historical cost to compute. Now, there are companies today designing chips specifically for training and inference for large language models, so the cost of compute for training a new AI might go way down in the very near future.
Add to that the inevitable algorithmic advances over the next 10–20 years…
Then consider you don’t even need to train your own AI from scratch, you could simply fine-tune an AI on a particular author or set of authors yourself.
Take George R.R. Martin, the first five books in A Song of Ice and Fire have 1,736,054 words, which is probably about 2.6M tokens. If you did four training epochs that’s about 10.5M total training tokens.
If we assume training costs about 6x the usage cost of a model, and we take the most expensive OpenAI model gpt-4-32k
at $0.06/1k tokens, we might expect training to cost ~$0.36/1k tokens. With 10.5M total training tokens that’s about $3750 to fine-tune OpenAI’s most expensive model on the entire text of A Song of Ice and Fire ($83 on gpt-3.5-turbo
).
Even if we imagine his total output is ten times that, and we’re going to do the fine-tuning on 100 equivalent authors, it doesn’t matter… Time always wins.
When anyone in the world can fine-tune a model on an author’s entire body of work for pennies it becomes rather a moot point.
The simple fact is, a lot of things are going to need to dramatically change throughout all of human society over the next generation, lawsuits like this might gum up the works for a few years, but in ten or twenty years it’s not going to matter.