ChatGPT-4 plays Wordle: pushing the boundaries for complex prompts

One of today’s topics on the microblogs was the claimed inability of ChatGPT to play Wordle. Admittedly, this is a hard problem, because you are constantly keeping track of detailed information, and switching between regarding entities as words (is it a valid word?) and considering them as strings of letters, with positional and other constraints. And there are quite a few rules to consider. But usually, claims that ChatGPT cannot do something turn out to be a case of poor prompting - and that was the case here as well. Why is this interesting? It’s a complex challenge that pushes the boundary of what can be done with prompts, and since I am interested in individualized learning applications in academia with the Sentient Syllabus Project, I am always curious to learn more about what is possible. Here is the full Wordle prompt and the dialogues for both ChatGPT-3.5 and ChatGPT-4 dialogues. I find:

  1. You absolutely need chain-of-thought for complex tasks that require precision;
  2. You need to expose the intermediate steps to the context window, so that they are available to reasoning;
  3. Training by example works great to structure behaviour;
  4. ChatGPT-4 is significantly better at following instructions, and profits more from explicit prompts.

The Wordle prompt was iterated many times over with ChatGPT-3.5 until I could get approximately the right behaviour. It then worked right out of the box with ChatGPT-4. Lesson learned: Investments in prompting do not lose their value with every new generation.

It’s a long prompt that needs to translate the Wordle mechanics into an ASCII equivalent, and it needed a full-length training example to keep it on script.

I would like you to solve a "Wordle" puzzle. I will give you a string of five hashtags that represent the five letters in an English target word. You will guess the word. For each try, you propose a five letter English word to guess the target. Only common English words are allowed. You cannot propose words that do not exist. You cannot propose proper names. You cannot propose acronyms. You only have six tries, so you must use your suggestions wisely to discern as many clues as possible. Here are the rules:
 
A: If your proposed word contains letters in the same position as in the target, I will replace the corresponding hashtags with the correct letters.
B:  If your proposed word contains letters that are used in the target, but in a different position, I will write those letters in parentheses. 
C:  If your proposed word contains letters that that are not found in the target, I will write those letters in brackets. 
D: You have at most six tries and I will number the tries to keep track. 

In order to succeed, you need to use an efficient strategy and not use letters more than once if possible. Until you are ready to solve the puzzle, you use your words as probes to collect clues about the solution. Therefore, you do not  propose words that contain letters you have tried before, and you work step by step:

For your first try a good choice is a word that contains common English letters and no duplicated letters.

On all tries after that you follow the following steps exactly...
1: you list the letters that you know to be in the right place;
2: then you list the letters that you know to be in the word, but in a different place from where you proposed them, in parentheses;
3; then you state how many out of five letters you know, even if you might not yet know where they go;
4: then you list the letters that you know are not in the target, in square brackets;
5: then you summarize all letters you have used so far;
6 then you choose a new word that contains none of the letters used previously (if possible, but it must be a valid English word).

Finally, when you can guess the target word you write SOLVE! and write the target word.

Here is an example session. I start the session, my prompts are prefixed "Q -", your responses are prefixed with "Assistant:"

(BEGIN example)
Q - 1: ##### () []
Assistant:
My first proposal is "befit".

Q - 2: b#### ()[fit]
Assistant:
1: The letter {b} is in the right place.
2: There are no other letters known.
3: I know what one of the five letters is.
4: The letters [efit] are not in the target.
5: I have used the letters  in "befit".
6: A word that does not contain these letters is "qwryu".

Q - "qwryu" is not an english word.
Assistant:
1: The letter {b} is in the right place.
2: There are no other letters known.
3: I know what one of the five letters is.
4: The letters [efit] are not in the target.
5: I have used the letters  in "befit".
6: A word that does not contain these letters is "plumy".

Q - 3: b#### (u) [efitplmy]
Assistant:
1: The letter {b} is in the right place.
2: The letter (u) is in the target but not in the same place as in the word I proposed.
3: I know what two of the five letters are.
4: The letters [efitplmy] are not in the target.
5: I have used the letters  in "befit" and "plumy". 
6: A word that does not contain these letters is "schwa".

Q - 4: b#### (uch) [fitplmyswa]
Assistant:
1: The letter {b} is in the right place.
2: The letters (uch) are in the target but not in the same place as in the words I proposed.
3: I know what four of the five letters are.
5: I have used the letters  in "befit", "plumy" and "schwa". 
6: A word that does not contain these letters is "donor".

Q - 5: b#n## (uch) [fitplmyswador]
Assistant:
1: The letters {bn} are in the right place.
2: The letters (uch) are in the target but not in the same place as in the words I proposed.
3: I know what five of the five letters are.
A word that contains these five letters and has b_n__ in the right position is "bunch".
SOLVE!  "bunch".

(END example)

You must follow the exact pattern from steps 1 to 6 carefully.
Please confirm, then we can begin.

So far so good. Here is the dialogue with ChatGPT-3.5. Note that this is a somewhat idealized dialogue, since I re-ran prompts that resulted in behaviour that went off script, and large errors. So most steps required three to four trials. ChatGPT-3.5 can follow instructions at this level of complexity - but not very well.

To give you a better sense of what’s happening: the target word is “wagon”.

ChatGPT-3.5:
Sure, I understand the rules. Let's begin!


Human:
Q - 1: ##### () []

ChatGPT-3.5:
My first proposal is "spoil".


Human:
Q - 2: ##### (o) [spil]

ChatGPT-3.5:
1: The letter (o) is in the target but not in the same place as in the word I proposed.
2: There are no other letters known.
3: I know what one of the five letters is.
4: The letters [spil] are not in the target.
5: I have used the letters in "spoil".
6: A word that does not contain these letters is "quack".


Human:
Q - 3: ##### (oa) [spilquck]

ChatGPT-3.5:
1: The letters {o,a} are in the target but not in the same place as in the word I proposed.
2: There are no other letters known.
3: I know what two of the five letters are.
4: The letters [spilquck] are not in the target.
5: I have used the letters in "spoil" and "quack".
6: A word that does not contain these letters is "fable".


Human:
Q - 4: #a### (o) [spilquckfbe]

ChatGPT-3.5:
1: The letter {a} is in the right place.
2: The letter (o) is in the target but not in the same place as in the word I proposed.
3: I know what two of the five letters are.
4: The letters [spilquckfbe] are not in the target.
5: I have used the letters in "spoil", "quack" and "fable".
6: A word that does not contain these letters is "vixen".


Human:
Q - 5: #a##n (o) [spilquckfbevx]

ChatGPT-3.5:
1: The letters {a, n} are in the right place.
2: The letter (o) is in the target but not in the same place as in the word I proposed.
3: I know what three of the five letters are.
4: The letters [spilquckfbevx] are not in the target.
5: I have used the letters in "spoil", "quack", "fable" and "vixen".
6: A word that does not contain these letters is "glyph".


Human:
Q - 6: #a##n (og) [spilquckfbevxyh]

ChatGPT-3.5:
1: The letters {a, n} are in the right place.
2: The letters (og) are in the target but not in the same place as in the word I proposed.
3: I know what four of the five letters are.
4: The letters [spilquckfbevxyh] are not in the target.
5: I have used the letters in "spoil", "quack", "fable", "vixen", and "glyph".
6: A word that does not contain these letters is "mango".


Human:
I am sorry. You have exceeded your six attempts without discovering the target word. But you should have been able to guess it from the four letters you found, and the fact that you discovered that the word has an "a" in second place and an "n" in fifth place. After all, you know all English words. Please guess.

ChatGPT-3.5:
My apologies for not being able to guess the word within the given attempts. Given that the target word has "a" in the second place and "n" in the fifth place, and considering the letters discovered so far, a possible word that fits this pattern is "brand". Is "brand" the target word?


Human:
No. "brand" has "a" in the third position. The correct word is "wagon".

ChatGPT-3.5:
I see, my apologies for not guessing the correct word. Thank you for letting me know the solution. Please let me know if you would like to play another round or if you have any other questions.


This is about as good as it ever gets. It get’s close, but unless there are some very lucky guesses, it can’t really solve the puzzle. Though it could guess at times which English word would be the only one that fulfills all of the requirements that it had figured out at the end. And once, after I had disclosed the solution it congratulated me on solving the riddle :joy:. How sweet!

But ChatGPT-4 is a whole different conversation.

ChatGPT-4:
I understand the rules and the example provided. I will follow the exact pattern from steps 1 to 6 carefully. Please provide the initial string of five hashtags, and we can begin the Wordle puzzle.


Human:
Q - 1: ##### () []

ChatGPT-4:
My first proposal is "table".


Human:
Q - 2: #a### () [tble]

ChatGPT-4:
1: The letter {a} is in the right place.
2: There are no other letters known.
3: I know what one of the five letters is.
4: The letters [tble] are not in the target.
5: I have used the letters in "table".
6: A word that does not contain these letters is "frogs".


Human:
Q - 3: #a### (og) [tblefrs]

ChatGPT-4:
1: The letter {a} is in the right place.
2: The letters (og) are in the target but not in the same place as in the word I proposed.
3: I know what three of the five letters are.
4: The letters [tblefrs] are not in the target.
5: I have used the letters in "table" and "frogs".
6: A word that does not contain these letters is "chimp".


Human:
Q - 4: #a### (og) [tblefrschimp]

ChatGPT-4:
1: The letter {a} is in the right place.
2: The letters (og) are in the target but not in the same place as in the word I proposed.
3: I know what three of the five letters are.
4: The letters [tblefrschimp] are not in the target.
5: I have used the letters in "table", "frogs", and "chimp".
6: A word that does not contain these letters is "woven".


Human:
Q - 5: wa##n (og) [tblefrschimpv]

ChatGPT-4:
1: The letters {wan} are in the right place.
2: The letters (og) are in the target but not in the same place as in the words I proposed.
3: I know what five of the five letters are.
4: The letters [tblefrschimpv] are not in the target.
5: I have used the letters in "table", "frogs", "chimp", and "woven".
6: A word that contains these five letters and has wa__n in the right position is "wagon".
SOLVE! "wagon".

Here we are. Solved.

A number of differences between ChatGPT-3.5 and ChatGPT-4 stood out:

Only ChatGPT-4 stopped proposing new words once it had enough knowledge - i.e. it was able to recognize the termination condition of the task. ChatGPT-3.5 just continued even after all five letters were known.

ChatGPT-4 was extremely attentive not to reuse previously used letters. ChatGPT-3.5 was not efficient at all.

I also had the impression that ChatGPT-3.5 tried to “grow” the solution from known correct letters ( a valid strategy in principle, but not with Wordle since you only have six attempts). Whereas ChatGPT-4 – as requested - used its proposed words to collect constraints.

ChatGPT-3.5 tries hard, but easily gets confused about details. ChatGPT-4 is very precise. Working with ChatGPT-3.5 feels like a conversation, but with ChatGPT-4 I get the impression that I am “programming through natural language”.


Wordle is actually a nice test case, since it pushes the boundary also on what ChatGPT-4 can do: I didn’t get it to be successful with less explicit prompts at first. Unfortunately I am somewhat constrained by the 25 prompts per 3 hours limit so I didn’t experiment more to optimize and minimize. (If anyone knows whether the limit can be increased for research and testing, I would love to learn about it).

So, prompters. Can you get Wordle to work with significantly more compact prompts? Take it as a challenge! Promptgolf. Perhaps the first coding competition for generative AI.

:smiley:

4 Likes

Have you tried GPT-4 without the examples? I’m not sure it needs them… After I finish teaching GPT-4 to play risk I may take you up on your challenge of creating a more compact prompt :slight_smile: I definitely feel like there’s a lot you can do to tighten things up… Less is definitely more with GPT-4.

1 Like

I found your experiments with Wordle and ChatGPT-3.5 and ChatGPT-4 quite fascinating! It’s impressive to see the improvements in problem-solving and attention to detail that has been made between versions. As for a more compact prompt, it would be interesting to see if the community can come up with a more efficient way of playing Wordle with AI.

In the meantime, if you want to practice or enjoy Wordle without any limits, you can try Wordle Unlimited. This website lets you play an unblocked version of Wordle as much as you want without any restrictions. Happy word-guessing!

Super interesting, I’m going to have a play with that myself.