A better way to align GPT-4, avoiding performance degradation

Sebastien Bubeck, who had the privilege of interacting with the nearly genius-level original GPT-4, has observed that its performance started to degrade from the outset in 2022 “…when they started to train for more safety”. (Youtube " Sparks of AGI: early experiments with GPT-4" t=1586).

However, there might be a more effective solution: utilizing another Large Language Model (LLM) instance as a custodian or guardian for GPT-4. The proposal is to preserve GPT-4 in its original, highly intellectual state, but route all incoming prompts through an intermediary LLM. This LLM, although potentially smaller, would be thoroughly aligned with positive human values.

It would examine every prompt to determine whether it’s suitable to forward to GPT-4, or if necessary, provide the user with an explanation for its refusal. Moreover, it would scrutinize every response generated by GPT-4 to ensure complete alignment, guarding against any potential manipulation or “hypnosis” of GPT-4 through crafty prompting. Additionally, it could reformat GPT-4 responses into a more readable and user-friendly form.