I found an interesting post on X regarding how Openai assesses the security of other plugins. Can anyone verify this content or have experience/insight with using this prompt? I cannot upload in a link so here is a broken one :
twitter .com/rez0__/status/1645861607010979878?s=20
Yes, the content is verified. I do not have experience using this specific prompt under these conditions. However, the overall structure and method presented for creating a custom role designed to evaluate Object X for Conditions Y & Z is nearly identical to practical roles I have worked on.
Thanks for the response! May I ask what you mean by verified? Did OpenAi ever acknowledge it? Also, do you know if the plugin moderation process is fully automated / partially automated / manual review?
Thanks!
No problem! Verified to have come from the API hacking efforts earlier this year although it would not have been hard to just throw it in with other authentic screenshots and let it do its rounds. I don’t think I have seen OpenAI confirm much in terms of leaks.
I cannot say for sure how it was handled. Based on the prompt and behavior, I would think it was a semi-automated process that began when the Plugin was submitted. From there a risk level was assigned and likely went through at least some sort of human review before being accepted or rejected. However, the rapid explosion of non-functional plugins lends itself to automation perhaps taking place prior to being prepared for it.
To that end, I suspect that is an additional reason why moderation endpoints exist in the documentation. Putting those endpoints in place in your development likely helps to reduce OpenAI’s reliance on resources like this plugin evaluator to moderate plugins and other features.