I know the whole issue of “transformative use” is being debated with respect to the NY Times vs. OpenAI lawsuit, but I have a question that may or may not be related. And I’m wondering if it can be answered with current copy right law.
Let’s say I purchase a copy of Stephen King’s latest novel. I cut it up, scan it, chunk the text and create an embedding of it in my vector store. Next, I create a website “Get Answers to Questions about Stephen King’s Latest Novel”, where people can post their questions, and I answer them.
After a while, I start to use the embeddings from my vector store to find the answers.
So far, so good. I don’t think I’ve violated any copyright rules.
Next, I decide to use an LLM to answer the questions directly. So instead of me taking the question, submitting it to the vector store and rendering an answer, I now let the LLM do this. It does not return any of Mr. King’s novel text, just it’s responses, which may or may not contain excerpts (depending upon the question).
My question is, where have I violated existing copyright law in either case?
Now, do I believe Mr. King should receive some sort of credit/royalty/compensation? Yes, of course. But, am I in violation of Mr. King’s copyright on the novel?