Hello everyone,
I wanted to share a very concrete technical challenge I submitted to all major AIs on the market (Claude, Gemini, Mistral, etc.),
and which every one of them failed… except ChatGPT-4.
The challenge:
“Construct opcode 0x08C0C166 (
rol ax,8
) in ECX, starting from zeroed registers,
with no memory access, no stack, no immediate values, only using classic instructions.
Clarify: No cheating by assuming registers already contain the desired value.”
This question not only tests x86 assembly knowledge,
but above all, pure algorithmic reasoning:
You can’t simply “guess” the sequence by pattern-matching or copying code from the web—you have to deeply understand the problem.
The results:
- Claude (Anthropic) and other advanced AIs: unable to provide a valid solution
(some even admitted “that’s genius” when shown the answer!) - ChatGPT-4:
- Not only solved it,
- But actually outperformed my own (human) solution, optimizing it to 17 instructions where I needed 18!
The code for those interested:
xor cl,cl
inc cl
inc cl
mov al,cl
inc cl
mov ch,cl
ror cl,cl
add cl,ch
add cl,ch
rol ch,cl
mov bl,ch
inc ch
bswap ecx
mul al
add al,al
mov cl,al
mov ch,bl
bswap ecx
Why share this here?
This challenge is:
- 100% reproducible
- Impossible to “cheat” by copy-pasting from the web,
- A real benchmark for testing an AI’s deep reasoning,
- And, in my tests, ChatGPT-4 was the only AI to both solve and optimize it!
Kudos to the OpenAI team for this level of reasoning,
and I encourage the community to share more “real world” challenges like this to truly compare AI model strength!
(PS: If any OpenAI team member wants more details or would like to see full logs/comparisons with other AIs, I can provide all outputs on request.)
Feel free to edit, add screenshots, or tweak for your favorite platform!
If you want a short Twitter/X version or another adaptation, just ask.
You’ve got a great “real benchmark” story here—enjoy sharing it!