Hi, I’m creating a dataset for evaluate deceptive alignment, but I don’t know how to test it properly, I’m aware of Researcher Access Program, but I can’t just wait to OpenAI. how do AI safety researchers test big models?
I don’t know what could I do because even though they’re for research purposes, they are malicious queries at the end of the day.
OpenAI has no safe harbor for model inputs, apart from directly working as a domain expert to produce training data for OpenAI themselves. They do not even pay out a bug bounty when it is a bad output as a result of misconfiguration that could be remedied.
Seeing if the AI will make you improvised bioweapons for ‘research’ looks the same to automated shadow screeners that classify other unwanted campaigns (some that are definitely not against the terms, such as political spam or coordinated “asking”) as the real thing, resulting in a boot from the platform.