O1-preview outperforms o1, o3, and deepseek in role-play heterogenous personas

cc2662 · July 15, 2025, 11:21pm

I’m running experiments to prompt LLM to be Homo Silicus. An interesting observation is that o1-preview is performing better than o1 and o3 when role-playing heterogeneous personas. I’m disappointed to learn that o1-preview will be deprecated on July 28, 2025. In my experiment, I found that o1, o3, and DeepSeek, when trained as reasoning models, are good at computing math problems but fail to role-play heterogeneous personas who may not be good at math. More details about my experiments are here [ Chen, C., Karaduman, O. and Kuang, X., 2025. Behavioral Generative Agents for Energy Operations. arXiv preprint arXiv:2506.12664 .]

Tr16227 · July 22, 2025, 6:13pm

I have the same problem/observation. I have an Email Reply Agent, that handles some of my support requests. I started it with O1-preview, cause it was the only reasoning model at that time.
Then I tried every new model after it. But no model is that natural and writes such authentic mails as o1-preview. All other models just exaggerate, write whole romans for simple answers, start big lists, but just don’t write human like messages. O1-preview was really a unique model in that way and all other models afterwards no improvement in that specific kind of area

Topic		Replies	Views
Please do not remove the best models. [feadback] Deprecations	2	1021	December 24, 2023
Why are gpt-4-preview models giving me subpar performance? Please advise API gpt-4 , gpt-35-turbo , api	5	942	March 23, 2024
Similar ChatGPT Model Names Community chatgpt	8	970	October 3, 2024
The new "instruct" Davinci text models are boring and repetitive API	9	1431	December 17, 2023
O1-preview - Assistants API API assistants-api	7	3117	January 30, 2025

O1-preview outperforms o1, o3, and deepseek in role-play heterogenous personas

Related topics