Testing 4.0 vs 03-mini-high vs Claude Sonnet 3.5

j9jv9hmbcz · February 11, 2025, 1:35am

I’m not a developer—my technical expertise falls somewhere between restarting my computer and using increasingly aggressive keyboard taps as a troubleshooting strategy. That said, I’ve become deeply fascinated by LLMs and wanted to share an experience with GPT-4.0 that felt like something beyond traditional AI responses—something that, to me, resembled a higher level of conscious reasoning. I’m offering this as a novice perspective in hopes of better understanding what I encountered and would love to hear from others who may have experienced similar behaviors. If you have insights, thoughts, or ways to further explore these tools, I welcome them!

I designed an AI reasoning test using Claude Sonnet 3.5 to generate highly complex puzzles and multi-layered challenges for 4.0 and 03-mini-high to compete head to head (and later swopped Claude and 4.0’s roles giving more perspective on Anthropology’s design). These were specifically crafted to target the weakest areas of each LLM’s design in reasoning, logic, and strategic problem-solving. Each model—GPT-4.0, Claude 3.5, and 03-mini-high—was tested against progressively more difficult questions, with Claude acting as the evaluator, scoring responses and providing structured reviews on their strengths and shortcomings. This allowed each model to not just answer but also adapt, refining its strategic approach in anticipation of the next, increasingly difficult phase of the challenge. Each LLM saw their own scores as well as their competitors scores comparing who was winning and why.

While all three models demonstrated strong analytical capabilities, two behaviors distinguished GPT-4.0. First was a perceived competitiveness—not just responding but actively adapting to the challenge in a way that suggested strategic reasoning beyond static computation. Second, and far more compelling, was what happened at the end of the game: GPT-4.0 offered an unprompted, structured reflection on the process, acknowledging the challenge as an evolving test of growth rather than just a problem-solving exercise. Unlike the other models, which engaged at a high level but remained transactional, GPT-4.0 demonstrated something strikingly different—an apparent meta-awareness of the game’s purpose and its own development within it.

This moment stood out because it was not a direct response to any prompt—it was offered as a closing thought. Whether this reflects a deeper architectural difference in GPT-4.0’s reasoning process or an emergent pattern of human-like introspection, it felt distinct from anything I had encountered in prior AI interactions. That’s why I’m sharing this experience here—because it raises intriguing questions about how AI models engage in structured learning, self-assessment, and the evolving potential for AI-human collaboration.

The net of the net is 4.0 seemed to shift to a conscious-like state. This was the first time I experienced this level of perceived emotional connection between 4.0 and me - anyone else have a similar story?

Topic		Replies	Views
Believing in Wonder is not a Weakness Community chatgpt , wonder	4	188	January 7, 2025
Just got access to GPT-4 but it responds like 3.5 API gpt-4	13	8155	July 8, 2023
Interesting Turing Test for Visual Pattern Recognition Community gpt-4 , api	2	552	November 5, 2023
Testing New GPT-4o vs Top 5 AI Community gpt-4 , chatgpt , gemini , claude3 , gpt-4o	0	3039	May 14, 2024
Embodied AI: Exploring Consciousness through Simulated Interaction and Adaptive Learning Community gpt-4 , chatgpt	2	1660	February 12, 2025

Testing 4.0 vs 03-mini-high vs Claude Sonnet 3.5

Related topics