The forum requires that people create replies that are more than short one liners, It can get a little frustrating, but it is what it is.
I think it’s a rather arbitrary and haphazard rule - but as you say: it is what it is.
ChatGPT refuses to cite from copyrighted material:
https://chat.openai.com/share/d331255e-730a-4d45-a167-e2cd53826dff
ChatGPT generates content in the style and with the characters of a copyrighted book:
https://chat.openai.com/share/3cfcfc7f-29d4-4e7c-9737-54341ce292ec
“Mr und Mrs Dursley im Ligusterweg Nummer 4 waren stolz darauf, ganz und gar normal zu sein, sehr stolz sogar.”
I see what you mean! Thanks for checking it out!
Maybe so, but I think we are in a time where lot’s of people are antsy about this new technology. I just scraped a site of 900 posts consisting of legal articles and case law. Here is what the website posts:
Readers do not have to request permission to reprint items, however all reprinted items must bear one of the two following attributions:
If your reprint is electronic, as follows, keeping the link intact:
Reprinted from blah, blah.
Of course, every post I uploaded has the required citation.
Now, when they wrote this (probably 10 years or more ago), they had no idea a day would come when someone would not only download every article posted, but feed that into a computer to help generate answers to questions.
To be clear, I have no interest whatsoever in reproducing this information for publication. I am not their competitor. I only use it as part of my " Deepening Comprehension through Complementary Content " strategy I discussed here: How to Fine-Tune without Fine-Tuning -- Or, How to Make your RAG Implementation Smarter
To that end, whenever a citation is returned in a query that references their content, the associated link goes to their website, not mines. I don’t know how much more transparent I could be.
But, how much do you want to bet I’m going to be hearing from them when they find out? How do you think they are going to react, even though I have completely complied with their terms of use?
So, yeah, I think we’re going to see all kinds of people coming out of the woodwork – especially lawyers. Nonsense or not. When has that ever stopped them?
Came across this interesting series of articles in the Atlantic:
The author went ahead and investigated this issue and is making some valuable, fair points.
What has been standing out to me are the points made about the big tech companies
- admitting that they did use datasets with large amounts of copyrighted books,
- that it is consensus in the developer community that these books have high value for LLM training and
- that this type of piracy by large companies is different than previously when consumers pirated copyrighted material for personal use instead of monetary gains.
As a heavy user, developer and full-blown enthusiast of AI I cannot simply dismiss these arguments.
Here is the link to the author’s profile:
https://twitter.com/_alexreisner