What's the best practice when using a copyrighted file for GPT?

I am working with an author to build a GPT for his book. We are using the copyrighted PDF for the book as the file input to the GPT. The PDF file is normally sold thru the publisher for about $75, so we don’t want users to be able to download the PDF from the GPT. But, it seems this is currently allowed by a GPT user: 😱 Concerns About File Information Extraction from GPTs Uploads - Community - OpenAI Developer Forum. What’s the best practice recommended by OpenAI for this scenario?

1 Like

I haven’t tested, but I don’t think it’s possible to download the PDF?

Note the last bit from the website in question who got the file names…

Thanks @PaulBellow - in parts of the thread I linked to above, the discussion indicates that if one doesn’t have the code interpreter enabled, the user cannot directly download the file (but they can get to pieces of it at a time). For my GPT, the code interpreter is already disabled, but I’m wondering if this is a safe approach in the long run (if OpenAI will stand behind protected the PDF from download).

If you upload a file to be used as reference for a Custom GPT you should assume that someone will be able to either,

A. Download the file, or
B. Extract the entire text of the file verbatim

This may change in the future, but right now that is the only guidance anyone can really give.

You simply aren’t able to give access to the text via the GPT while simultaneously restricting access to the text.

If it’s information you want to give the user access to and you give the user access, the user will be able to access it.

There may be ways to make it more challenging but that will likely only have two outcomes, neither of which you want,

  1. A generally worse experience for your users as the GPT dances around trying to avoid giving them too much information.
  2. You create a game you can’t win with a subset of users motivated and skilled at breaking such protections.

Maybe make your own custom bot and only give access to those who pay the $75 for the PDF… and a watered down version for the Custom GPT?

That’s a thought, but the downsides of a custom bot as I see it:

  • I have to develop/maintain/host it
  • I have to use my own API key
  • I don’t think any users will want to pay $75 up-front

Can you elaborate on your idea of a watered down version for the Custom GPT?


I was just thinking maybe have GPT-4 summarize the main points - just a tease / example of what they’ll get with the full $75…

Is the PDF not worth $75?

1 Like

So is it okay to upload copyrighted content to GPTs? (I would argue that it is fine, especially for non-fiction).
Does the “copyright shield” apply?

I am doing this as well (for Marc Randolph, author of “That Will Never Work”).

I’ve been testing over several weeks, indeed it was possible around November when the platform launched to download and access files from the Knowledge dataset.

However, in recent weeks this seems to have been stopped. The GPT will no longer print out long passages from the book, and definitely doesn’t let you download the file. However, we’ve seen people jailbreak Chat/GPT before with tricks like being polite and persistent, and if it’s changed in one direction it’s possible it could change back, but for now I think you’re safe.

Hope that helps!


Are you worried about 75 being stolen in the store? By stuffing the whole book I think you'd be better off worrying about who's picking up when it's on the bestseller list. Stuffing an entire book with files isn't a good idea. If you don't put a label saying it's a 75 file when it’s in the store. No one would be willing to spend money to steal GPT Stroe’s data without knowing what they’ll get.


I’m not worried about people stealing uploaded documents.
My thinking is as follows:
Facts cannot be copyright protected, original work can be.
For example, if someone writes a book about Napoleon Bonaparte it will contain a lot of historical facts like the Battle of Waterloo was on June 18, 1815.
Let’s say I build a Napoleon GPT, it can use these facts, no problem.
I can read the book on Napoleon, take notes, upload these notes to Napoleon GPT… no problem.
Can I upload the original book? - so I don’t have to take notes…
I mean, it’s all in the background, hidden from users.
What do you think? what is the consensus?
Is there a difference if this is a private or a public GPT?

I don’t care much about rights. I am a person born in a country where the personal information of 10 million people has been leaked from the government. There is no longer anything to worry about regarding personal information. But maybe I have enough justice for AI. I chose to buy the book. Then tell the seller that it is used with AI. It is my right in this matter to choose to purchase data to use with support tools. In the future, if anyone chooses to be nonsense about anti AI, it would be better to write it clearly on the cover. If it will help

But using the entire file in the format received It would be an ineffective action. Creating guidelines for the AI to know what it needs to do with the file It helped a lot. In addition, cutting out unnecessary content helps AI to access documents more per load.