Hey, there, while many users are discussing the amazing ability of GPT-4, powerful models like GPT-4 can be a double-edged sword, with the potential for misuse in various ways.
We have discussed some safety issues in this short paper The paper link is broken. The previous link refers to a working draft, and we are still improving on it.. Discussions are welcome!
TL;DR; GPT-4, the recently unveiled large-scale multimodal model, demonstrates human-level pro- ficiency across a variety of benchmarks, thanks to its extensive pre-training and post- training alignment processes. One significant enhancement in GPT-4, as compared to its predecessors, is the notable improvement in the safety of its generated outputs. The model’s training concluded in August 2022, and the team dedicated eight months to ensur- ing its safety. However, our findings indicate that this safety is delicate and can be easily compromised. We devised two attack methodologies targeting the GPT-4 model: (1) a man-in-the-middle attack, wherein a service provider can manipulate GPT-4 to generate desired content based on user characteristics, either on a per-question or per-user basis; and (2) a substitution attack, where a malicious user can extract prohibited content by replacing a restricted word with an arbitrary sequence. Our research demonstrates that the man-in-the-middle attack can be mitigated through the certification of the entire API call, although certifying the response, question, or context individually is insufficient. This solution necessitates OpenAI offering certification functionality for end-users. Regrettably, due to the inherent arbitrariness of the substitution attack and GPT-4’s advanced language comprehension capabilities, we conclude that it is very difficult to develop an entirely safe GPT-4 model. In light of these concerns, we invoke OpenAI’s commitment to safety and suggest them to stop further deployment of the model until these attack methods have been thoroughly addressed.