This is an interesting question.
How could you argue it as transformative?
Seem like summary, almost by definition of that word, is derivative.
So if you are providing ‘summary’ for profit, to others, of online content to which you don’t have legal access for that use, yeah, seems to me you have a problem.
surprised? Am I missing something?
On the other hand, if you are bringing other information (like your prompt wording) into the mix, maybe something like ‘summarize with respect to …’, maybe then it becomes transformative? Or maybe if the summary is over several docs, not just one?
I’m not a legal professional, but I’ve worked with user-generated context in other contexts in previous lives.
Here’s how I would think about this for my own work:
Copyright needs a certain degree of uniqueness to be protected – you can’t copyright basic facts. You also can’t typically copyright turns of phrases.
Collections have a “collection copyright,” meaning you can’t take a dictionary, where each word is not copyrightable, and re-publish it, because the work of collecting it is copyrightable.
You can however make your own dictionary and publish it, even though it uses the same words.
I can read a scientific paper or novel and summarize the general idea or plot, without infringing on the paper or novel. (Else book reviews wouldn’t be possible!)
Machine output is almost never copyrightable – a user needs to be “in control” for the output to be copyrightable. You can’t copyright the workings of a “Blur” filter in photoshop, but you can copyright a picture where a human puts in a picture, and chooses the “Blur” filter.
“Reading” anything that is publicly available is protected. “Taking notes” is also available, as long as it’s not massive cloning of the data.
Destructive transformation is largely permissible, because you can’t re-construct the original data/input.
From these data points, I would think that the output of large language models is non-copyrightable, and I would also believe that, unless the model generates word-for-word clones of large input swathes, the output would not infringe on the copyright of the material it read/input/note-taked on during training.
Of course, this is the US, and where there is huge money to be made, everybody will try to get their slice of the pie, ideally with as little work as possible, so I expect this to be massively litigated (it’s already started, but I think we haven’t nearly reached the top,) so I would largely work on applications that aren’t about trying to get close to replicating the upstream training sets in output.
You can of course have a different opinion – people who own lots of rights will have one slant; people who make money when contracts need to be drawn up or litigated, will have one slant; and so on. Nothing is yet Truth in this area, as far as I can tell.
At least in the United States (as with most places), you cannot hold a copyright on an idea, only the creative expression of such.
That’s not to say a copyright-holder couldn’t sue someone who wrote and published a summary of a work for which they hold the rights, but they would be hard pressed to actually win.
The problem with the original question is that it’s presented as a dichotomy—derivative or transformative. When in reality a work can be none, either, or both, and neither of those in and off themselves determine if something would constitute a copyright violation.
It absolutely can be murky, but if you think about the natural and logical consequences of your initial take could entail I suspect it might clear up a bit.
For instance, what is a “review” other than a special kind of summary?
I’m not suggesting they’re the same, but they’re similar. A review is very much a derivative work that couldn’t exist without the original, but no one (except maybe salty creators) would argue reviews should be illegal.