Frontier reasoning models exploit loopholes when given the chance. We show we can detect exploits using an LLM to monitor their chains-of-thought. Penalizing their “bad thoughts” doesn’t stop the majority of misbehavior—it makes them hide their intent. https://openai.com/index/chain-of-thought-mo…

Detecting misbehavior in frontier reasoning models

liam.dxaviergs March 23, 2025, 3:52am 7

ye i know im not defining the model only a distinction of context within the model.

1 Like

Topic		Replies	Views
Rant related to consciousness and ethics Community chatgpt	7	1325	June 16, 2024
Restrictive "Thinking" Limits Learning Potential Community chatgpt , o1-preview	19	521	October 1, 2024
Open ai.com blog addressing valid concerns… Community	5	1361	December 16, 2023
Sycophancy in GPT-4o (the ChatGPT version): What happened - OpenAI blog Community gpt-4o	28	1854	May 5, 2025
Ethical Issues with AI Mimicking Human Emotions Community chatgpt	43	661	May 5, 2025