GPT-4 vs Chess: Why is it so bad at it?

Has anyone tried playing chess with gpt-4? It plays illegal moves and end up in Gotham Chess videos - where it get mocked for the moves it takes. It still plays much better than Bard in my experiments.

So, how could you improve it to the point where it beats you?

Ah, like most prompt engineers out there figure it out, it’s all about the context!

If in every prompt you give the chess pieces locations, it already improves the moves by quite a bit. If you tell it in your request whose pieces are where, it gets even better. If you are able to provide the targeted pieces by each piece even better.

At the end of the day, it’s quite interesting how gpt-4 can solve much more complicated coding problems with much less context than what is required to play chess. With all the evals challenge, I believe that by gpt-5 or gpt-4 + image recognition it will soon be beating most of the player-base.

additional note: Chess bot competitions are boring, it would be quite interesting to see one with trash talking… perhaps in the future?

It’s not trained on chess. There are plenty of chess bots. Garry Kasparov joked that he was the first knowledge worker who lost his job to an AI. The best human comes nowhere near the best chess AI today.

It’s good with attention, better than humans. But humans have the power of intuition - we know that some positions are strong or weak. And an experienced player knows some early gambits and how to counter them. Like a common noob move is to move a certain pawn so that the bishop can attack the opponent’s rook early, but a slightly experienced player will see that coming.

GPT probably knows the rules, but forgets them while playing. There’s a reason why ChatGPT summarizes actions it’s about to take before doing them. It’s to keep things in that window of attention.

Can it play chess well though? We know it has terrible calculation skills and it is aware of this, and it’s able to code a calculator. With the help of functions and plugins, it probably could.

Well, I’m talking about the API. Also, the intuition part seems to be done by increasing the temperature. All I’m saying is, there is a way to make it play like a 2000+ player with prompt engineering without having it be trained on chess (which - to be fair, I’d like some source on that, seems like an interesting read!)

What I mean is, yes with chatgpt or vanilla API requests you’ll get gpt-4 playing like a 400 elo player, with proper prompt engineering and coding/safe guarding/context you can get it to jump to a really competitive mode.

Considering LLMs are much different than the previous method, I think it’s interesting to see how with little effort you can get it to perform much, much better for a specific task!

It’s an old post but just in case you did not see this whole line of X/Twitter conversation about getting ELOs up

see: twitter. com /AISafetyMemes/status/1704954170619347449