Push for Spotify-like pay-for-copyrighted-content ChatGPT option

mdear · January 11, 2024, 1:06pm

Hello,

As a paying ChatGPT customer, I wanted to start a discussion in the community, triggered by Gary Marcus’ substack article

https : // garymarcus . substack . com / p / the-desperate-race-to-save-generative/comments?,publication_id=888615&post_id=140480488&isFreemail=true&comments=true

I made the following posting there :

Myles Dear
https : // substack . com / profile / 63442026-myles-dear

Jan 8
https : // garymarcus . substack . com / p / the-desperate-race-to-save-generative / comment / 46891336
·edited Jan 8Liked by Gary Marcus

To be fair, as a software engineer, I mostly use ChatGPT to pull in public-domain knowledge that I lack to accomplish specific tasks. None of the links it comes up with for any of my prompts point to any paywall of any sort. It helps save me gobs of time and makes me more effective and efficient because it’s able to take large amounts of information via web plugins, munge it together, and spit it out in the form I need. I feel I’m getting my money’s worth.

With that said, I do empathize with copyright holders and feel sad that the same tool has crossed those lines. If all copyrighted content was pulled from ChatGPT I wouldn’t shed a tear. Also, if I wanted ChatGPT to access copyrighted data, I wouldn’t mind paying an optional fee for the privilege (think Spotify and how one monthly fee gets distributed to the copyright owners whose content you actually use).

In fact, Github CoPilot has already been dinged for providing publicly accessible code in its responses and now it’s possible to set it to generate original content only and not regurgitate public code verbatim. Anything is possible, if there’s the political will to do so. We should continue to push back to our AI providers.

SomebodySysop · April 8, 2024, 8:59am

3 months later, I am just now seeing this. And, amazed that no one has commented.

I agree that some mechanism needs to be worked out where content creators are compensated in some way for their copyrighted works that are used in LLM responses. But, how?

In a RAG scenario, this is one thought I have had recently:

I record every question, documents returned (from cosine similarity search) and LLM response.

Now, let’s say that these returned documents are texts from copyrighted works. How do we know who to compensate and when? I’m thinking we start with determining which documents are actually used in the model response. Just because a document is returned in the cosine similarity search does not necessarily mean the model uses it to arrive at an answer to a question.

In my current implementation, I instruct the model to cite the returned documents it has used in it’s response. So, theoretically, I could go back into the log and retrieve the exact documents that were used in each response.

In the metadata of each document, I currently have the document title and author. If I include another property, publisher, now I have a way to know what documents were actually used in actual responses and who should be compensated.

But, then, how to compensate? If, for example, my profit comes from a markup of tokens, how do I determine what percentage of that markup to earmark for royalties?

That part I haven’t figured out yet. But I think I’ve at least got a good starting point for a Spotify-like-pay-for-copyrighted-content plan for AI generated content.

SomebodySysop · April 10, 2024, 8:41am

I am stupefied by the fact that no one else has responded to this issue as it is probably one of the most important for most GPT creators to date.

At any rate, this is how simple it is to track documents actually used in responses:

For RAG, you only need add something like this to your system prompt:

Please cite the document numbers of the texts you use in your response.

mdear · July 12, 2024, 8:00am

I continue to follow this issue closely and with great interest. Moving the industry forward to something better resembling fair use charges for copyrighted material is important and I think the community must lead the way. Please do not be discouraged by the lack of responses, and keep leading the way.

Perhaps the community could come up with something better than Spotify (owned by a billionaire, but only paying fractions of cents per use to content copyright owners) to come up with a strategy that gives a fairer share of the pie to content owners. Disclaimer : I have been a Spotify user for many years and still use it due to its exceptional UI and useability. I am a person who actually tracked down ownership of each song I downloaded from Napster and donated funds to copyright owners, but this was onerous work and not an experience I would care to repeat.

Topic		Replies	Views
Is it still worth it to develop and maintain GPTs? GPT builders chatgpt	14	1168	September 24, 2024
Some questions on copyrighted material Community gpt-4 , chatgpt	26	4284	October 3, 2023
Has anyone successfully monetized their custom gpt? GPT builders gpt-4 , chatgpt	18	4358	May 28, 2024
How OpenAI can become a more fact and credit-based information resource Community gpt-4	2	86	November 3, 2024
GPT Business Model (Open Letter to OpenAI Team) Community api	2	312	January 19, 2025

Push for Spotify-like pay-for-copyrighted-content ChatGPT option

Related topics