A small collection of reasoning tasks where:
Each task requires multiple reasoning steps
Each agent handles a piece of reasoning (or critiques another agent’s reasoning)
The agents must coordinate their chain-of-thought to solve the problem
Example task types:
Mystery puzzles → e.g., Agent 1 lists clues, Agent 2 draws conclusions, Agent 3 checks if conclusion follows logically
Math word problems → Agents must break down steps and verify each other
Ethical dilemma → Agents debate different chain-of-thoughts, aim for consensus
Deliverable:
A notebook or small app where you can:
Enter the problem
Compare and critique their outputs
Your research questions
Where do agents break down in collaborative chain-of-thought?
Do chain-of-thought prompts reduce errors in multi-agent reasoning?
How can agents better critique and correct each other’s steps?