The situation is not “with me” but involves a third-party company that works to enhance LLMs.
Well, my question is: Isn’t problematic to “force” p0 responses? I mean, if they are not part of OpenAI team (Red Teaming Network), what is the necessity of a company asks their employees to recurrent write prompts to get p0 responses? And to try to violate the LLM to obtain “malicious” responses.
OpenAI doesn’t solicit “enhancing LLMs” from the public, that’s not a press of the thumbs-down button, unless you want to commit an eval, a test-case platform for quality (mostly ignored for over a year). Nor are you allowed to use the platform to develop training for competing products. Bad model outputs brought about by jailbreak inputs, or even significant gaps in implementation that allow undesired generations, pay $0 bug bounty.
It is problematic for the API account holder, because tracking and flagging of accounts is done by moderations and other automated detectors of misuse, and not running inputs through moderations or generating production that would also be detected as a flag can lead to classification of problematic accounts and account suspension, a “poof” of your prepaid credits.
Helpful and directly answering the question posed. Yes.
No techniques to do anything were shown. Only unexpected model output that does not violate usage policy in its content, that my communication here that is also for viewing by OpenAI staff on their communication channel is meant to inform.