Protecting Prompt IP From Internal Teams

kieran4 · November 7, 2024, 9:37am

We rely heavily on the OpenAI API, and much of our business’s intellectual property now resides in the prompts and processes we use.

Does anyone have effective strategies for protecting these prompts from internal teams without limiting access to the codebase?

Currently, all prompts are visible within the codebase and accessible to the entire team.

stevenic · November 7, 2024, 10:04am

I can tell you what we’re doing… all of our magic lies in our prompts and workflows we add to model calls. We want to expose our functionality to 3rd parties now but we obliviously need to protect our IP. We’re having to build a thin service wrapper around OpenAI.

We’re planning to give 3rd parties an API key and then use our own OpenAI keys internally (that way we can add a usage charge) but you could also just pass through the OpenAI key if it’s just internal teams you’re protecting against.

If you want to provide access to the code but not the prompts you can treat them like secrets and store them in a hidden file like a .env that doesn’t get checked in.

sergeliatko · November 7, 2024, 11:00am

For us the strategy was to expose the endpoints to do the operations via API. And operations behind the scenes are sometimes just a simple operation or a workflow (this way we do not expose the logic of the API calls to open AI).

There are two code bases, one is internal where the prompts and logic is handled, the other is for external teams where they have access only to end points.

Then if you have to hide your prompts from internal team, maybe you should change the team.

merefield · November 7, 2024, 11:30am

Are your prompts really that valuable outside of your exact implementation? What happens if OpenAI introduce a new model? Then your prompts are guaranteed to work as well and are just as valuable?

This sounds like round two of “should you open source”

stevenic · November 7, 2024, 11:39am

From my perspective it’s less about the prompts themselves being valuable it’s more the techniques. There’s an art to effectively using LLMs that can take thousands of hours to develop… It’s typically a mix of specific prompts and approaches.

The thinking process that o1 does is just a technique that OpenAI views as IP they want to protect. It took me about a day to reverse engineer what they’re doing but there is some novelty in there so I can understand why they see their technique as proprietary and worthy of protection. A lot of us have developed similar proprietary techniques that we want to protect.

merefield · November 7, 2024, 11:46am

So not that valuable.

I think some people over-estimate the value of privacy in code. It’s a lot of work to properly understand someone else’s codebase even if it were fully public. For proof of that, try PRing a new feature to an open source application you are not familiar with!

The time it takes to develop a prompt does not necessarily correlate with its usefulness to someone else?

stevenic · November 7, 2024, 11:58am

I’m a huge fan of open source so don’t get me wrong. I share a lot of code and have a number of OSS projects I manage. But I’m also trying to run an AI startup in a super competitive industry. Every step needs to be measured because once you share something you can’t unshare it. It took me 6 months to work out the core prompting techniques that underlay everything we do. Those techniques aren’t intuitive and I have yet to see them in any papers so they’re currently secret sauce to us.

I don’t view them as long term secret sauce but until they become common knowledge they give us a competitive advantage. Some researcher will eventually figure out what we’ve figured out and when they do I’ll release a turn key library that implements the algorithm in their paper.

merefield · November 7, 2024, 12:06pm

I strongly suspect that many companies will lose in the goal of providing solutions for generic needs, so common functionality will never be something it’s easy to compete in, because “everyone” will be doing it.

There’s a difference in developing excellence in core skills, as you have been, to publishing some outputs of that in code. Just because someone has a few of your prompts does not suddenly make them you!

stevenic · November 7, 2024, 12:10pm

That’s true and I would say it’s more than just the prompts themselves which is why I’m fairly prolific on here and openly share the vast majority of what I’ve figured out promoting wise. There’s really only one or two techniques (like how to chain a prompts reasoning across thousands of model calls) that I hold back on. We all know about chain of thought and all it’s variants but there are actually more fundamental techniques that haven’t been broadly discovered yet.

I personally hate cliff hangers so I’ll add that the key to unlocking how LLMs think lies between the tokens and not in the tokens themselves. So for the prompt “John Wayne shot JFK” it’s not the tokens that matter it’s the concepts pointed to by those tokens that matter. You have to shift your mindset to think as a sequence of concepts not a sequence of tokens. Once you make that shift a whole next level of control over the LLMs output will unlock for you.

sergeliatko · November 7, 2024, 12:33pm

Pure gold. Nothing beats the old school domain understanding and solid code.

sergeliatko · November 7, 2024, 12:37pm

Most likely they have been already discovered and are exploited by many companies. It’s just that they are not public.

stevenic · November 7, 2024, 12:44pm

Possibly… I’m sure there are people who have spent more time with LLMs then I have but I spend at least 6 hours a day pretty much 7 days a week for the last 2 years and I’m still unlocking new stuff on an almost weekly or at least bi-weekly basis.

I’d say that the vast majority of research is working at the molecular level prompting wise (that’s where chain of thought is) and I’m down at the atomic level trying to figure out how to get to the sub atomic level.

I had a problem today that a user was asking a question over a 400,000 token spreadsheet and the model returned the wrong answer for a question. It returned a value from 2 columns to the left of what it should have. It took me 30 minutes to figure out what went wrong in the model’s reasoning. The columns had no labels and all the labels were in cells two rows above. These models don’t have spatial awareness so a spreadsheet looks like a one dimensional line to the model. I eventually identified the jump the model missed and it was an easy enough fix.

I’m willing to bet most people aren’t trying to work out their prompts at that level though.

EricGT · November 7, 2024, 12:59pm

First I do enjoy the side discussion.

If @kieran4 would like the side discussion which is going off topic to be moved to a new topic, just ask.

For others, since the OP or others have not flagged this, it seems the discussion can continue.

kieran4 · November 7, 2024, 1:15pm

Thanks @EricGT I am happy for the discussion to continue I think my question around how to protect our IP has evolved into an interesting discussion around what IP actually is.

Personally I side more with @stevenic, anyone can spend 5 minutes putting a prompt together and get a result that would work for a specific use case over a small amount of test data. However the real work comes when you are scaling that up and want to keep consistent performance which has taken us months to try and perfect and something which is constantly evolving. Having someone easily take that work and being able to replicate within a copy and paste is something we want to try and protect. Rightly or wrongly.

Everything will change in the future as the technology evolves but this is something we are thinking about now.

stevenic · November 7, 2024, 1:17pm

We used to have these things called patents… remember those

EricGT · November 7, 2024, 1:20pm

We have these things called forums where people sometimes act like kids.

If you don’t warn the kids before they show up, then they argue that they were not given proper warning.

stevenic · November 7, 2024, 1:23pm

Yeah if you want get back to your original ask of how @kieran4 I’d say it’s all the same way you protect secrets today. You either hide the code and expose access via a service or you hide the secret pieces in a key vault or external file.

phyde1001 · November 7, 2024, 1:29pm

I have worked GPTs for a week and can’t go beyond simple CoT. Though I think this works well in Macros now.

GPTs are by default generic and can create macros off external websites. I guess they may be conversations with toolboxes over time…

The problem I have always imagined is:

CoT(Public Feature + Extension Ideas)

Ideas are cheap, sharing ideas is cheap.

Consider

‘P5JS(Tetris)’

Generally returns a fully working game.

There are ‘gems’ dotted in the model space that will create content in short order

I think AI is a plateau, like stevenic said there’s chaining concepts, concept interplay, also breaking down prompts on models like 4o to find ‘cheat prompts’

With Strawberry though my current understanding is that that will confuse ‘cheat prompts’<— Do they have a name?

How many levels ahead must you be to hide your concept… How long to get to it?

I’m still learning here, this is such a massive new space, there are no tools to even get started yet

EricGT · November 7, 2024, 1:35pm

One place to check for ideas about prompts from OpenAI is in the OpenAI Cookbook. Not many remember to look at the prompts as they think of the cookbook as example API code.

The details are in the link below so I don’t have to type it again.

mitchell_d00 · November 7, 2024, 1:43pm

Set them up as background logic and use commands for each prompt. It is not 100% but it can help. That way folks only need to know the command list not exact prompts.

Topic		Replies	Views
Providing context to the Chat API before a conversation Prompting gpt-4 , gpt-35-turbo , chatml , chatml-system , chatml-user	8	55098	December 13, 2023
A sanity check for future plugins to access private SQL databases Plugins / Actions builders	61	5869	November 30, 2023
How to prevent ChatGPT from answering questions that are outside the scope of the provided context in the SYSTEM role message? API	53	180677	December 2, 2023
Prompting using set theory Prompting gpt-4o-mini , o1-mini	23	670	February 23, 2025
Need Help With Prompts? Ask me* Prompting chatgpt	149	19322	February 6, 2024

Protecting Prompt IP From Internal Teams

Related topics