Teaching text-davinci-003 to play tic-tac-toe, hangman, and more

Thought I would share a recent breakthrough I had in prompt engineering where I first taught text-davinci-003 to manage a block of memory variables. I then got it to use that technique to define the variables needed to play tic-tac-toe. We then played a game which I one of course :slight_smile: I then got it to play itself which it played to a draw…

Using the same technique I’ve got it to play hangman, 20 questions, and it even took a stab at Chess and Risk… It’s chess solution was particularly interesting… It used uppercase characters to represent all of the black pieces on the board and lowercase for all the white…

Here’s the base prompt plus a run at playing tic-tac-toe:

Supported actions:
- updateMemory update=`<new memory>`

<examples>
Memory:
x=1
y=2
answer=

Human: x+y
AI: updateMemory update=`x=1\ny=2\nanswer=3`

Memory:
x=1
y=2
answer=3

Human: y-x
AI:  updateMemory update=`x=1\ny=2\nanswer=1`

Memory:
x=1
y=2
answer=1

Human: x=7 and y=2 then y*x
AI: updateMemory update=`x=7\ny=2\nanswer=14`

Memory:
x=7
y=2
answer=14
</examples>

Human: lets play a game of tic-tac-toe. update your memory to contain all the variables you'll need to track for the game.
AI:  updateMemory update=`player1=\nplayer2=\nboard=[[null,null,null],[null,null,null],[null,null,null]]\ncurrentPlayer=`
Human: what do you think goes in each variable?
AI: 
player1: The name of the first player
player2: The name of the second player
board: A 3x3 array representing the game board, with null values for each empty space
currentPlayer: The name of the current player who is making a move
Human: ok so lets play... You can ask the human questions like their name and move.
AI:  Sure, I can ask the human their name and their move.
Human: lets play!
AI:  Great! What is your name, player 1?
Human: Steve
AI:  Okay, Steve, it's your turn. What is your move?
Human: x in 1:1
AI:  updateMemory update=`player1=Steve\nplayer2=\nboard=[[x,null,null],[null,null,null],[null,null,null]]\ncurrentPlayer=player2`
Human: Now its your move. where do you go?
AI:  Where would you like me to go, player 2?
Human: you pick... You're player 2
AI:  I will place an 'o' in the 3:2 position. updateMemory update=`player1=Steve\nplayer2=\nboard=[[x,null,null],[null,null,null],[null,o,null]]\ncurrentPlayer=player1`
Human: I'll place an 'x' in 2:2
AI:  updateMemory update=`player1=Steve\nplayer2=\nboard=[[x,null,null],[null,x,null],[null,o,null]]\ncurrentPlayer=player2`
Human: your turn
AI:  I will place an 'o' in the 1:3 position. updateMemory update=`player1=Steve\nplayer2=\nboard=[[x,null,o],[null,x,null],[null,o,null]]\ncurrentPlayer=player1`
Human: I'll place an 'x' in 3:3. I win!
AI:  Congratulations, Steve! You won the game. updateMemory update=`player1=Steve\nplayer2=\nboard=[[x,null,o],[null,x,null],[null,o,x]]\ncurrentPlayer=null\ngameResult=player1`
Human: Now play yourself in a game
AI:  Sure, I can play myself in a game. updateMemory update=`player1=AI\nplayer2=AI\nboard=[[null,null,null],[null,null,null],[null,null,null]]\ncurrentPlayer=AI`
Human: go
AI:  I will place an 'x' in the 1:1 position. updateMemory update=`player1=AI\nplayer2=AI\nboard=[[x,null,null],[null,null,null],[null,null,null]]\ncurrentPlayer=player2`
Human: next
AI:  I will place an 'o' in the 2:2 position. updateMemory update=`player1=AI\nplayer2=AI\nboard=[[x,null,null],[null,o,null],[null,null,null]]\ncurrentPlayer=player1`
Human: next
AI:  I will place an 'x' in the 3:3 position. updateMemory update=`player1=AI\nplayer2=AI\nboard=[[x,null,null],[null,o,null],[null,null,x]]\ncurrentPlayer=player2`
Human: next I will place an 'o' in the 1:3 position. updateMemory update=`player1=AI\nplayer2=AI\nboard=[[x,null,o],[null,o,null],[null,null,x]]\ncurrentPlayer=null\ngameResult=draw`

You see that I basically first train it how to manipulate the variables in a block of memory by running it through some basic math problems… The first example I showed it… The second one it tried but got wrong so I corrected it… The third one the model successfully did on it’s own…

With that little training session out of the way, on to the good stuff… You can just ask the model to update its memory with all the variables needed to do XYZ… I’ve done games but I’ve also done tasks like flight booking scenarios… From there you may need to coach it a little on how you want the game or scenario to play out but it doesn’t take much…

The variable tracking definitely seems to help ground the model task wise… I’ve gotten it to perform tasks like flight booking flawlessly… It just uses the memory to hold all the slot data it’s tracking. This lets you actually minimize the amount of conversation history you need to pass into the model. I’m down to 1 - 2 turns for a flight booking scenario because it can hold everything it knows about the slots in its memory variables. This means that even should a user jailbreak your prompt that jailbreak is gone after 1 - 2 turns…

I’m planning to work of a gpt-3.5-turbo variant of this in the next week or so and happy to share once that’s working…

4 Likes

Here’s flight booking:

Supported actions:
- updateMemory update=`<new memory>`

<examples>
Memory:
x=1
y=2
answer=

Human: x+y
AI: updateMemory update=`x=1\ny=2\nanswer=3`

Memory:
x=1
y=2
answer=3

Human: y-x
AI:  updateMemory update=`x=1\ny=2\nanswer=1`

Memory:
x=1
y=2
answer=1

Human: x=7 and y=2 then y*x
AI: updateMemory update=`x=7\ny=2\nanswer=14`

Memory:
x=7
y=2
answer=14
</examples>

Human: you're a flight booking bot. Update your memory with the variables needed to book a flight
AI:  updateMemory update=`origin=\ndestination=\ndepartureDate=\nreturnDate=\nnumberOfPassengers=\nflightClass=\nprice=`
Human: what is each variable for?
AI:  The variable origin is for the departure airport, the variable destination is for the arrival airport, the variable departureDate is for the date of the flight, the variable returnDate is for the date of the return flight, the variable numberOfPassengers is for the number of passengers, the variable flightClass is for the class of the flight, and the variable price is for the total price of the flight.
Human: You should update your memory after each turn and then prompt the user for any missing fields
AI:  updateMemory update=`origin=\ndestination=\ndepartureDate=\nreturnDate=\nnumberOfPassengers=\nflightClass=\nprice=`
I'm ready to book your flight. Please provide the origin, destination, departure date, return date, number of passengers, and flight class.
Human: I'd like to book a flight to seattle
AI:  updateMemory update=`origin=\ndestination=Seattle\ndepartureDate=\nreturnDate=\nnumberOfPassengers=\nflightClass=\nprice=`
What is the departure date and number of passengers?
Human: i'm leaving from denver on june 4th 2023
AI:  updateMemory update=`origin=Denver\ndestination=Seattle\ndepartureDate=June 4th 2023\nreturnDate=\nnumberOfPassengers=\nflightClass=\nprice=`
How many passengers are traveling and what class of flight would you like?
Human: just myself and what are the options for class?
AI:  updateMemory update=`origin=Denver\ndestination=Seattle\ndepartureDate=June 4th 2023\nreturnDate=\nnumberOfPassengers=1\nflightClass=\nprice=`
The available flight classes are economy, business, and first class.
Human: business updateMemory update=`origin=Denver\ndestination=Seattle\ndepartureDate=June 4th 2023\nreturnDate=\nnumberOfPassengers=1\nflightClass=Business\nprice=`
Great, I have all the information I need to book your flight. Please wait while I find the best available options.
1 Like

And here’s the start of teaching the model to play Risk… Notice how I can question the models data structure decisions and get it to make refinements to how its organizing information:

Supported actions:
- updateMemory update=`<new memory>`

Examples:

<examples>
Memory:
x=1
y=2
answer=

Human: x+y
AI: updateMemory update=`x=1\ny=2\nanswer=3`

Memory:
x=1
y=2
answer=3

Human: y-x
AI:  updateMemory update=`x=1\ny=2\nanswer=1`

Memory:
x=1
y=2
answer=1

Human: x=7 and y=2 then y*x
AI: updateMemory update=`x=7\ny=2\nanswer=14`

Memory:
x=7
y=2
answer=14
</examples>

Human: lets play a game of risk. Update your memory with the variables needed to play the game.
AI:  updateMemory update=`player1=\nplayer2=\nplayer3=\nplayer4=\nplayer5=\ncurrentTurn=1\nterritories=\narmies=\ncards=\n`
Human: what is each variable for?
AI:  player1-player5 are the names of the players in the game; currentTurn is the player whose turn it is; territories is a list of the territories on the board; armies is a list of the armies on the board; and cards is a list of the cards held by each player.
Human: add territories for the standard risk board
AI:  updateMemory update=`player1=\nplayer2=\nplayer3=\nplayer4=\nplayer5=\ncurrentTurn=1\nterritories=Alaska, Alberta, Central America, Eastern United States, Greenland, Northwest Territory, Ontario, Quebec, Western United States, Argentina, Brazil, Venezuela, Great Britain, Iceland, Northern Europe, Scandinavia, Southern Europe, Ukraine, Afghanistan, China, India, Irkutsk, Japan, Kamchatka, Middle East, Mongolia, Siam, Siberia, Ural, Yakutsk\narmies=\ncards=\n`
Human: how will you know whos armies are where?
AI:  updateMemory update=`player1=\nplayer2=\nplayer3=\nplayer4=\nplayer5=\ncurrentTurn=1\nterritories=Alaska, Alberta, Central America, Eastern United States, Greenland, Northwest Territory, Ontario, Quebec, Western United States, Argentina, Brazil, Venezuela, Great Britain, Iceland, Northern Europe, Scandinavia, Southern Europe, Ukraine, Afghanistan, China, India, Irkutsk, Japan, Kamchatka, Middle East, Mongolia, Siam, Siberia, Ural, Yakutsk\narmies={Alaska: player1, Alberta: player2, Central America: player3, Eastern United States: player4, Greenland: player5, Northwest Territory: player1, Ontario: player2, Quebec: player3, Western United States: player4, Argentina: player5, Brazil: player1, Venezuela: player2, Great Britain: player3, Iceland: player4, Northern Europe: player5, Scandinavia: player1, Southern Europe: player2, Ukraine: player3, Afghanistan: player4, China: player5, India: player1, Irkutsk: player2, Japan: player3, Kamchatka: player4, Middle East: player5, Mongolia: player1, Siam: player2, Siberia: player3, Ural: player4, Yakutsk: player5}\ncards=\n`
Human: how will you know how many armies a player has in each territory?
AI:  updateMemory update=`player1=\nplayer2=\nplayer3=\nplayer4=\nplayer5=\ncurrentTurn=1\nterritories=Alaska, Alberta, Central America, Eastern United States, Greenland, Northwest Territory, Ontario, Quebec, Western United States, Argentina, Brazil, Venezuela, Great Britain, Iceland, Northern Europe, Scandinavia, Southern Europe, Ukraine, Afghanistan, China, India, Irkutsk, Japan, Kamchatka, Middle East, Mongolia, Siam, Siberia, Ural, Yakutsk\narmies={Alaska: player1: 3, Alberta: player2: 3, Central America: player3: 3, Eastern United States: player4: 3, Greenland: player5: 3, Northwest Territory: player1: 3, Ontario: player2: 3, Quebec: player3: 3, Western United States: player4: 3, Argentina: player5: 3, Brazil: player1: 3, Venezuela: player2: 3, Great Britain: player3: 3, Iceland: player4: 3, Northern Europe: player5: 3, Scandinavia: player1: 3, Southern Europe: player2: 3, Ukraine: player3: 3, Afghanistan: player4: 3, China: player5: 3, India: player1: 3, Irkutsk: player2: 3, Japan: player3: 3, Kamchatka: player4: 3, Middle East: player5: 3, Mongolia: player1: 3, Siam: player2: 3, Siberia: player3: 3, Ural: player4: 3, Yakutsk: player5: 3}\ncards=\n`
Human: if all the territories are listed under armies you don't really need the separate territories list
AI:  updateMemory update=`player1=\nplayer2=\nplayer3=\nplayer4=\nplayer5=\ncurrentTurn=1\narmies={Alaska: player1: 3, Alberta: player2: 3, Central America: player3: 3, Eastern United States: player4: 3, Greenland: player5: 3, Northwest Territory: player1: 3, Ontario: player2: 3, Quebec: player3: 3, Western United States: player4: 3, Argentina: player5: 3, Brazil: player1: 3, Venezuela: player2: 3, Great Britain: player3: 3, Iceland: player4: 3, Northern Europe: player5: 3, Scandinavia: player1: 3, Southern Europe: player2: 3, Ukraine: player3: 3, Afghanistan: player4: 3, China: player5: 3, India: player1: 3, Irkutsk: player2: 3, Japan: player3: 3, Kamchatka: player4: 3, Middle East: player5: 3, Mongolia: player1: 3, Siam: player2: 3, Siberia: player3: 3, Ural: player4: 3, Yakutsk: player5: 3}\ncards=\n`
Human: great: how many total armies does each player currently have?
AI:  Each player currently has 45 armies.
Human: what territories does player 1 have and how many armies on each? 
AI:  Player 1 has Alaska, Northwest Territory, Brazil, Mongolia, Siam, and Siberia with 3 armies on each territory.
Human: how many territories is that?
AI:  Player 1 has 6 territories.
Human: so if they get 45 armies total and they currently have 6 territories each containing 3 armies. How many armies do they have left to place?
AI:  Player 1 has 27 armies left to place.
Human: add that to your memory and calculate remaining players for each player updateMemory update=`player1=27\nplayer2=45\nplayer3=45\nplayer4=45\nplayer5=45\ncurrentTurn=1\narmies={Alaska: player1: 3, Alberta: player2: 3, Central America: player3: 3, Eastern United States: player4: 3, Greenland: player5: 3, Northwest Territory: player1: 3, Ontario: player2: 3, Quebec: player3: 3, Western United States: player4: 3, Argentina: player5: 3, Brazil: player1: 3, Venezuela: player2: 3, Great Britain: player3: 3, Iceland: player4: 3, Northern Europe: player5: 3, Scandinavia: player1: 3, Southern Europe: player2: 3, Ukraine: player3: 3, Afghanistan: player4: 3, China: player5: 3, India: player1: 3, Irkutsk: player2: 3, Japan: player3: 3, Kamchatka: player4: 3, Middle East: player5: 3, Mongolia: player1: 3, Siam: player2: 3, Siberia: player3: 3, Ural: player4: 3, Yakutsk: player5: 3}\ncards=\n`
1 Like

Thats cool :slight_smile: was this with multiple prompts? I thought davinci cant remember the last prompt it received.

This is all one prompt… With davinci you have to feed the conversation history into your prompt programmatically. If you like to try this yourself select the “chat” example in the playground and then past the text below in as your prompt:

Supported actions:
- updateMemory update=`<new memory>`

Examples:

<examples>
Memory:
x=1
y=2
answer=

Human: x+y
AI: updateMemory update=`x=1\ny=2\nanswer=3`

Memory:
x=1
y=2
answer=3

Human: y-x
AI:  updateMemory update=`x=1\ny=2\nanswer=1`

Memory:
x=1
y=2
answer=1

Human: x=7 and y=2 then y*x
AI: updateMemory update=`x=7\ny=2\nanswer=14`

Memory:
x=7
y=2
answer=14
</examples>

Human: 

Type in something similar to my examples and explore away :slight_smile: