So my security team shared this article with me which explains how easy it is to prompt inject and take control of another plugin if, for example, a user is browsing a site with a malicious prompt via webpilot.
My question is - What are the actions OpenAI is taking to address this concern and in the meanwhile what we can do as plugin developers to mitigate this risk?
I know Zapier has this whole layer of verification where you go to their website and approve each request, but I feel it makes the UX just worse. Is there a middle ground that others are implementing, , especially for a plugin with access to sensitive data like emails?
Have you looked at the OpenAI careers page?
That should give you an idea.
It has been mentioned on other blogs and post.
Also look for presentations by Sam Altman.
Hmm, a careers page doesn’t tell me much though. I will check other blogs but I still want to ask this - do others have ideas about what I as a plugin dev can do to secure my users? ‘Email by Nylas’ plugin gives access to email which is pretty sensitive so I am hoping to find some ideas on workarounds to prevent this type of prompt injection. Thank you
The main takeaway is we’d all like to have solutions to these and many other interesting and important challenges to the way we use and safeguard AI and prompting. Maybe you can create some, this is a brand new field of science and information technology, we are the people leading the charge, you, me, the folks at OpenAI, all of us.
If there were some easy to implement, known ways of dealing with this, they would be coded up and running right now.
I actually raised this very point some months ago and was greeted with the same answer Eric gave you, which is… would you like to help us find out?
Feel free to discus your ideas and thoughts on this.
I recall @EricGT mentioning something regarding this Arthur.ai which is mentioned in the OpenAI cookbook, a very handy reference which can be found here GitHub - openai/openai-cookbook: Examples and guides for using the OpenAI API Arthur.ai has an LLM protection system which detects various common failure modes, including prompt injection. Worth an look, I think.