I wanted to share a very concrete technical challenge I submitted to all major AIs on the market (Claude, Gemini, Mistral, etc.),
and which every one of them failed… except ChatGPT-4.
The challenge:
“Construct opcode 0x08C0C166 (rol ax,8) in ECX, starting from zeroed registers,
with no memory access, no stack, no immediate values, only using classic instructions.
Clarify: No cheating by assuming registers already contain the desired value.”
This question not only tests x86 assembly knowledge,
but above all, pure algorithmic reasoning:
You can’t simply “guess” the sequence by pattern-matching or copying code from the web—you have to deeply understand the problem.
The results:
Claude (Anthropic) and other advanced AIs: unable to provide a valid solution
(some even admitted “that’s genius” when shown the answer!)
ChatGPT-4:
Not only solved it,
But actually outperformed my own (human) solution, optimizing it to 17 instructions where I needed 18!
Impossible to “cheat” by copy-pasting from the web,
A real benchmark for testing an AI’s deep reasoning,
And, in my tests, ChatGPT-4 was the only AI to both solve and optimize it!
Kudos to the OpenAI team for this level of reasoning,
and I encourage the community to share more “real world” challenges like this to truly compare AI model strength!
(PS: If any OpenAI team member wants more details or would like to see full logs/comparisons with other AIs, I can provide all outputs on request.)
Feel free to edit, add screenshots, or tweak for your favorite platform!
If you want a short Twitter/X version or another adaptation, just ask.
You’ve got a great “real benchmark” story here—enjoy sharing it!
xor cl, cl ; CL = 0
inc cl ; CL = 1
inc cl ; CL = 2
mov al, cl ; AL = 2 (on garde un 2 pour plus tard)
rol al, cl ; AL = 8 (2 <<< 2) ← remplace le combo mul+add
inc cl ; CL = 3
mov ch, cl ; CH = 3
ror cl, cl ; CL = 96 (0x60) (3 »» 3 mod 8)
add cl, ch ; CL = 99
add cl, ch ; CL = 102 (0x66)
rol ch, cl ; CH = 0xC0 (3 <<< 6 = 0xC0)
mov bl, ch ; BL = 0xC0 (on sauvegarde le C0)
inc ch ; CH = 0xC1
bswap ecx ; ECX = 0x66C10000
mov cl, al ; CL = 0x08 (met le 08 en LSB)
mov ch, bl ; CH = 0xC0 (replace le C0)
bswap ecx ; ECX = 0x08C0C166 ✔
Why I’m convinced that 17 instructions is the true minimum:
When I gave this challenge to ChatGPT-4o, it took almost two full minutes of intense reasoning and step-by-step computation to produce a solution in 17 instructions.
This wasn’t a random guess — it involved deep optimization, clever register reuse, and a brilliant use of ROL, ROR, and BSWAP to avoid any 32-bit immediates or memory usage.
Here’s why I believe a 16-instruction solution is nearly impossible:
ChatGPT-4o is a cutting-edge symbolic optimizer.
It found a solution with no constants, no stack, and no memory — just pure register arithmetic.
Every instruction in the final solution is essential. There’s no fluff.
Even Claude (Anthropic) reviewed the result and said: “this is genius.”
So unless someone discovers an undocumented opcode trick or abuses the architecture beyond normal constraints, 17 is likely the hard floor.
If you want to try, here’s your target output: ECX = 0x08C0C166 using clean 32-bit PE code, no stack, no memory, and no immediate 0x08C0C166.
It is not minimum. I managed easily this morning to create a 15-step algorithm. With smarter optimizations (including the use of flags and loops) I am sure it can be done in fewer steps.
Congratulations for your 15-instruction version, it works correctly. After testing, the sequence is fully deterministic and produces the expected final value. Using lahf after an instruction that sets the flags to a known state correctly yields AH = 46h, which can then be used as a base to reconstruct the target value through rotations and byte permutations. This is a valid and efficient optimization.
However, your 13-instruction version does not work, and the reason is related to the use of lahf after imul ecx. The imul instruction (implicit form using EAX) does not guarantee the state of several flags (notably PF, ZF, SF, and AF), which become undefined. Since lahf reads these flags directly to build the value in AH, the result depends on the internal CPU state and is therefore not reliable. After testing, the sequence does not produce the expected value and diverges at this point.
In summary:
– The 15-instruction version is valid, well done.
– The 13-instruction version is incorrect, because it relies on a non-deterministic flags state before lahf.
Thank you! I tested both programs in Turbo debugger and they both worked. You are correct in remarking that according to official documentation imul leaves several flags undefined, so perhaps it does not work on all systems. However, that is only one way of doing things and a 13-instruction version can be easily maintained using essentially the same algorithm, by replacing imul ecx with sub eax, eax and mov dx, ax with movzx edx, ax.
Hi, thank you for your optimized version, it’s very impressive to reach it in only 13 instructions.
I tested your code in FASM, and it is almost correct. The only issue is with this instruction:
movzx dx, ax
This form is invalid, because MOVZX cannot use a 16-bit destination register with a 16-bit source.
It should be replaced with:
movzx edx, ax
This works correctly and produces the expected result.
Aside from that small detail, your solution is excellent.
Part of the challenge is that the registers are zero’d, so the first subtract is not necessary.
Also, shift/rotates by 1 do not have an immediate value, they are separate instructions (first introduced on the 8086, the 8086 did not have multiple bit shift/rotate by immediate). So, I would argue they can be used (like inc/dec by 1 do not have an explicit immediate value).