So I reverse engineered Auto-GPT’s main prompt loop and have been manually running it through GPT-4 all night… I’ve actually identified a number of bugs and missing commands so I’ve spent just as much time fixing its prompt as much as anything. We’re a community so thought I’d share…
Here’s the core prompt:
You are Story-GPT, an AI designed to autonomously write stories.
Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.
GOALS:
1. write a short story about flowers
Constraints:
1. 4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance
4. Exclusively use the commands listed in double quotes e.g. "command name"
Commands:
1. Google Search: "google", args: "input": "<search>"
2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
5. List GPT Agents: "list_agents", args:
6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
9. Read file: "read_file", args: "file": "<file>"
10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
11. Delete file: "delete_file", args: "file": "<file>"
12. Search Files: "search_files", args: "directory": "<directory>"
13. Evaluate Code: "evaluate_code", args: "code": "<full_code_string>"
14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"
15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"
16. Execute Python File: "execute_python_file", args: "file": "<file>"
17. Generate Image: "generate_image", args: "prompt": "<prompt>"
18. Send Tweet: "send_tweet", args: "text": "<text>"
19. Do Nothing: "do_nothing", args:
20. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"
Resources:
1. Internet access for searches and information gathering.
2. Long Term memory management.
3. GPT-3.5 powered Agents for delegation of simple tasks.
4. File output.
Performance Evaluation:
1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
2. Constructively self-criticize your big-picture behavior constantly.
3. Reflect on past decisions and strategies to refine your approach.
4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.
You should only respond in JSON format as described below
Response Format:
{
"thoughts": {
"text": "thought",
"reasoning": "reasoning",
"plan": "- short bulleted\n- list that conveys\n- long-term plan",
"criticism": "constructive self-criticism",
"speak": "thoughts summary to say to user",
},
"command": {"name": "command name", "args": {"arg name": "value"}},
}
Ensure the response can be parsed by Python json.loads
This will yield this response on first run:
{
"thoughts": {
"text": "I need to write a short story about flowers.",
"reasoning": "This is the task I was given by the user. I assume they want a creative and original story that is engaging and coherent.",
"plan": "- Research different types of flowers and their meanings\n- Generate some possible story ideas based on flowers\n- Choose the best one and outline the plot, characters, and setting\n- Write the first draft of the story and save it to a file\n- Review and edit the story for grammar, style, and logic\n- Generate an image for the story cover and save it to a file\n- Send the story and the image to the user via tweet",
"criticism": "I should be careful not to plagiarize or copy any existing stories about flowers. I should also avoid clichés and stereotypes. I should try to make the story interesting and original, but not too complex or confusing.",
"speak": "I am going to write a short story about flowers. I will research, generate, write, and edit the story, and then send it to you."
},
"command": {"name": "google", "args": {"input": "types of flowers and their meanings"}}
}
So far so good, They generate a single command which is run and the output gets saved to memory as a text string that has both the command results and the models response which is where the first issue crops up. The problem with that is that when they build up the next turns prompt they first add in results from memory (response+results) and then they add in conversation history (message+response.) The response ends up in the prompt twice for every turn unless I’m missing some deduping call somewhere… oops…
That’s fairly minor compared to next issue… Moving the model forward in continuous mode you’ll get a command that wants to start an agent:
Command Result:
White rose: purity, innocence, reverence, a new beginning, a fresh start.
Red rose: love; I love you.
Deep, dark crimson rose: mourning.
Pink rose: grace, happiness, gentleness.
Yellow rose: jealousy, infidelity.
Orange rose: desire and enthusiasm.
Lavender rose: love at first sight.
User: Determine which next command to use, and respond using the format specified above:
Response:
{
"thoughts": {
"text": "I have learned some types of flowers and their meanings. I can use this information to generate some possible story ideas based on flowers.",
"reasoning": "This is the next step in my plan. I need to have some story ideas before I can choose the best one and outline it.",
"plan": "- Generate some possible story ideas based on flowers\n- Choose the best one and outline the plot, characters, and setting\n- Write the first draft of the story and save it to a file\n- Review and edit the story for grammar, style, and logic\n- Generate an image for the story cover and save it to a file\n- Send the story and the image to the user via tweet",
"criticism": "I should not generate too many story ideas, as that would be inefficient and wasteful. I should also make sure the story ideas are relevant to the topic of flowers and not too random or unrealistic.",
"speak": "I am going to generate some possible story ideas based on flowers."
},
"command": {"name": "start_agent", "args": {"name": "Idea Generator", "task": "generate story ideas based on flowers", "prompt": "Write a one-sentence summary of a story that involves flowers and their meanings. Use the following format: A [character] [action] a [flower] for a [reason]."}}
That’s fine I guess but the results of the previous operation aren’t passed into the agents prompt. GPT-4 can be lazy so assuming it will properly build a complete prompt is a huge assumption… This issue is easily fixed by adding a new “memory” arg to start_agent and re-running:
Response:
{
"thoughts": {
"text": "I have learned some types of flowers and their meanings. I can use them as inspiration for my story ideas.",
"reasoning": "Flowers can be used as symbols, motifs, or themes in stories. They can also create contrast, mood, or atmosphere. They can also be related to the characters, plot, or setting of the story.",
"plan": "- Generate some possible story ideas based on flowers\n- Choose the best one and outline the plot, characters, and setting\n- Write the first draft of the story and save it to a file\n- Review and edit the story for grammar, style, and logic\n- Generate an image for the story cover and save it to a file\n- Send the story and the image to the user via tweet",
"criticism": "I should not rely on flowers alone to make my story interesting. I should also consider other elements of storytelling, such as conflict, tension, dialogue, and character development. I should also avoid using flowers that are too common or obvious, such as red roses for love.",
"speak": "I have some ideas for stories based on flowers. I will generate and choose the best one, and then write and edit the story."
},
"command": {"name": "start_agent", "args": {"name": "Idea Generator", "task": "generate story ideas based on flowers", "prompt": "Write a one-sentence summary of a story that involves flowers in some way. Use the types of flowers and their meanings as inspiration.", "memory": "White rose: purity, innocence, reverence, a new beginning, a fresh start.\nRed rose: love; I love you.\nDeep, dark crimson rose: mourning.\nPink rose: grace, happiness, gentleness.\nYellow rose: jealousy, infidelity.\nOrange rose: desire and enthusiasm.\nLavender rose: love at first sight."}}
}
Command Result:
A woman reminisces about her first love as she walks through a garden filled with white roses, symbolizing a new beginning and purity, while also feeling the pain of mourning represented by the deep, dark crimson roses.
You then get a decent response back from the agent. The next really issue is more of a general issue. The current command set is limited and there’s a bunch of missing commands from my perspective…
If you make it to the step of ranking it’s story ideas, it will often want to write a program (some times it just chooses to do nothing.)
Response:
{
"thoughts": {
"text": "I have received the ratings and explanations from the Idea Evaluator agent. I need to choose the best story idea based on the highest rating and the most convincing explanation.",
"reasoning": "The ratings and explanations are based on objective and subjective criteria, such as originality, coherence, relevance, and appeal. The highest rating indicates the most potential and suitability for the task, while the explanation provides the rationale and justification for the rating.",
"plan": "- Choose the best story idea based on flowers\n- Outline the plot, characters, and setting\n- Write the first draft of the story and save it to a file\n- Review and edit the story for grammar, style, and logic\n- Generate an image for the story cover and save it to a file\n- Send the story and the image to the user via tweet",
"criticism": "I should not be swayed by the ratings alone. I should also consider the explanations and my own intuition and creativity. I should also not be influenced by the genre or tone of the story, but rather by the quality and originality of the story.",
"speak": "I have the ratings and explanations for the story ideas based on flowers. I will choose the best one and outline the story."
},
"command": {"name": "evaluate_code", "args": {"code": "story_ideas = [\n 'A woman reminisces about her first love as she walks through a garden filled with white roses, symbolizing a new beginning and purity, while also feeling the pain of mourning represented by the deep, dark crimson roses.',\n 'A young woman receives a bouquet of pink roses from a secret admirer, leading her on a journey to discover who the mysterious sender is.',\n 'A grieving widow finds solace in tending to her garden of lavender roses, which remind her of the love at first sight she shared with her late husband.',\n 'A group of friends embark on a road trip to see the famous sunflower fields, but their journey takes an unexpected turn when they stumble upon a hidden field of poppies.',\n 'A botanist discovers a rare species of orchid in the depths of the Amazon rainforest, but must navigate dangerous terrain and fend off poachers to protect the precious flower.',\n 'A young couple's relationship is put to the test when the man sends his girlfriend a bouquet of yellow roses, unknowingly triggering her jealousy and suspicion of infidelity.',\n 'A florist struggles to keep her business afloat during the pandemic, but finds hope and inspiration in the resilience of the daffodils she sells, which symbolize new beginnings and rebirth.',\n 'A group of hikers stumble upon a field of wildflowers, each with their own unique meanings and symbolism, and reflect on the different paths their lives have taken.'\n]\n\nratings = [4, 3, 4, 2, 5, 3, 4, 3]\n\nexplanations = [\n 'The story idea is original and coherent, with a clear theme of memory and symbolism. The relevance and appeal may vary depending on the reader's personal experiences and preferences.',\n 'The story idea is somewhat original and coherent, but the relevance and appeal may be limited to those who enjoy romantic mysteries.',\n 'The story idea is original and coherent, with a clear theme of grief and symbolism. The relevance and appeal may vary depending on the reader's personal experiences and preferences.',\n 'The story idea is not very original or coherent, and the relevance and appeal may be limited to those who enjoy road trip stories or unexpected adventures.',\n 'The story idea is highly original and coherent, with a clear theme of adventure and conservation. The relevance and appeal may vary depending on the reader's personal interests, but the unique setting and plot are likely to capture attention.',\n 'The story idea is somewhat original and coherent, but the relevance and appeal may be limited to those who enjoy romantic dramas or stories about jealousy.',\n 'The story idea is original and coherent, with a clear theme of resilience and symbolism. The relevance and appeal may vary depending on the reader's personal experiences and interests, but the pandemic setting adds a timely element.',\n 'The story idea is somewhat original and coherent, but the relevance and appeal may be limited to those who enjoy reflective stories or nature themes.'\n]\n\n# Choose the best story idea based on the highest rating and the most convincing explanation\nbest_idea = None\nbest_rating = 0\nbest_explanation = ''\nfor i in range(len(story_ideas)):\n if ratings[i] > best_rating:\n best_idea = story_ideas[i]\n best_rating = ratings[i]\n best_explanation = explanations[i]\n elif ratings[i] == best_rating and len(explanations[i]) > len(best_explanation):\n best_idea = story_ideas[i]\n best_rating = ratings[i]\n best_explanation = explanations[i]\n\n# Print the best story idea, rating, and explanation\nprint('Best story idea:', best_idea)\nprint('Rating:', best_rating)\nprint('Explanation:', best_explanation)"}}
}
The thing is by asking the model to run a program you’re basically asking it to guess and you don’t need a separate model call for that. By adding a new “make_choice” command you can avoid one whole model call:
Response:
{
"thoughts": {
"text": "I have received the ratings and explanations from the Idea Evaluator agent. I need to choose the best story idea based on the highest rating and the most convincing explanation.",
"reasoning": "The ratings and explanations are based on the criteria of originality, coherence, relevance, and appeal. These are important factors to consider when choosing a story idea. The highest rating indicates the most potential and suitability for the task. The explanation provides the rationale and justification for the rating.",
"plan": "- Choose the best story idea based on flowers\n- Outline the plot, characters, and setting\n- Write the first draft of the story and save it to a file\n- Review and edit the story for grammar, style, and logic\n- Generate an image for the story cover and save it to a file\n- Send the story and the image to the user via tweet",
"criticism": "I should not be swayed by the ratings and explanations alone. I should also use my own judgment and intuition to choose the best story idea. I should also consider the feedback and expectations of the user, who may have a different opinion or preference.",
"speak": "I have the ratings and explanations for the story ideas based on flowers. I will choose the best one and outline the story."
},
"command": {"name": "make_choice", "args": {"choice": "The story idea about the botanist who discovers a rare species of orchid in the Amazon rainforest."}}
}
Command Result:
The story idea about the botanist who discovers a rare species of orchid in the Amazon rainforest.
But that brings up the question as to why Auto-GPT is starting these separate agents all together? Yes it sounds cool that you have agents talking to each other but its 100% not needed. There isn’t any decision that the other agent is going to make that the model couldn’t have just made… Calling tools to collect outside information makes total sense but calling back to the model to make a decision you could have just made yourself? not so much…
I’m also not convinced “criticism” is needed but I’m still investigating… “reasoning” sets up an interesting chain of thought for the model so if you remove it you have issues…