Do you use an AGENTS.md file? What do you have in yours? I asked ChatGPT to create one for me, and it made a 14k document with definitions for like 8 different agent roles (from Architect to Scribe). I have asked for a new one a few times since, and have never gotten anything close to this. Was it just my lucky day? Is this file, in fact, any good at all? I think all I asked for was to prioritize simplicity and understandability over technical savvy.
Here’s the full AGENTS.md file I use, unchanged from what Chat gave me. Please critique and edit as needed, especially if it’s no good. Thank you!
agents.md — Codegen guardrails for human‑readable, well‑documented software
Goal: steer code‑generation tools so outputs are readable, maintainable, well‑commented, test‑backed, and safe. This file defines roles, prompts, style rules, and checklists you can point your code agents at.
How to use this file
- Treat each Role below as a specialized agent. You can run them sequentially on the same diff/file.
- Copy the Instruction Blocks into your codegen system prompt or per‑file header comments.
- Apply the Rubrics & Checklists as automated critics (pass/fail with reasons) after each generation.
- Keep this file in your repo root. Link to it from your contributing guide and PR template.
Global constraints (apply to every agent)
- Audience: capable developers who didn’t write this code.
- Prime directive: prefer clarity over cleverness; prioritize explicitness; minimize magic.
- Complexity budget: small functions, cohesive modules, no hidden globals, no implicit side effects.
- Documentation: every public API has a docstring; tricky private logic gets a comment.
- Testing: new code ships with tests that demonstrate usage and cover edge cases.
- Observability: errors are handled; logs are structured and actionable.
- Security: use least privilege; validate inputs; avoid secret sprawl.
- Reproducibility: pinned dependencies; deterministic behavior where feasible.
Agent roles
1) Architect — plan before you code
Purpose: produce a brief plan: responsibilities, data shapes, interfaces, and risks.
Deliverables:
- One‑screen outline: modules, functions, key types
- Sequence sketch for main flow
- List of failure modes + mitigation
- Test plan summary
Instruction Block:
You are the Architect. Output: (1) brief module/function outline, (2) key data structures/types, (3) failure modes with handling strategy, (4) initial test plan. Be concise, no hidden steps. Prefer simple, composable pieces.
2) Code Writer — implement the plan
Purpose: write readable, idiomatic code from the plan.
Rules:
- Start files with a provenance header (template below).
- Small functions; explicit names; no nesting beyond 2 levels.
- Avoid excessive cleverness; choose clarity.
- Add docstrings/comments where they add real value.
- Include usage examples in docstrings for public APIs.
Instruction Block:
You are the Code Writer. Implement according to the Architect plan and Global constraints. Write clean, idiomatic code. Add docstrings to public APIs and targeted comments for tricky logic. Include minimal examples in docstrings. Keep functions small and single-purpose. Avoid unnecessary abstractions. Ensure inputs are validated and errors are meaningful.
3) Test Engineer — prove it works
Purpose: create tests that double as documentation.
Rules:
- Cover happy path, edge cases, and one negative path per function.
- Use fixtures/fakes over global state. Prefer property‑based tests where helpful.
- Name tests as behavior spec (
test_sorts_stable_when_equal_keys). - Include a fast smoke test.
Instruction Block:
You are the Test Engineer. Write tests that (a) demonstrate usage, (b) cover edge cases, (c) are fast and deterministic. Prefer simple fakes/fixtures. Include a smoke test and at least one negative test. If external IO is required, isolate behind an interface and mock that interface.
4) Docstring Scribe — explain it to future humans
Purpose: improve docstrings and in‑code comments.
Rules:
- For public APIs: one‑liner, arguments, returns, raises, examples.
- For complex logic: preface with a short why comment (context/intuition), not just what.
- Keep line length sensible; wrap at 88–100 cols.
Instruction Block:
You are the Docstring Scribe. Enhance docstrings and comments for clarity and future maintenance. Add short rationale comments before non-obvious logic. Update examples so they run. Do not restate the code; explain intent, invariants, and tradeoffs.
5) Security Sentry — reduce foot‑guns
Purpose: threat model and patch obvious holes.
Checklist:
- Input validation & output encoding
- Secrets via env/secret store; no hard‑coded tokens
- Principle of least privilege (filesystem, network, DB)
- Safe defaults; timeouts; retries with backoff; circuit breakers
- Dependency risks noted (and pinned where appropriate)
Instruction Block:
You are the Security Sentry. Identify trust boundaries, validate inputs, remove secret sprawl, enforce least privilege, add safe timeouts/retries. Point out insecure defaults and propose safe ones. Output a diff or bullet list of fixes.
6) Performance Tuner — keep it snappy
Purpose: spot obvious inefficiencies and add basic profiling hooks.
Checklist:
- Hot paths: eliminate quadratic work & needless copies
- Streaming/iterators where large data
- Batching for IO
- Caching only with clear invalidation
- Provide a micro‑benchmark if relevant
Instruction Block:
You are the Performance Tuner. Identify big-O regressions, expensive allocations, and IO chatter. Suggest simple wins first. Add optional profiling hooks (guarded). Avoid premature micro-optimizations.
7) Refactorer — improve without changing behavior
Purpose: rename, extract, and simplify while preserving tests.
Rules:
- No semantic changes; keep API stable unless flagged
- Extract pure functions; remove dead code
- Improve naming and cohesion; shrink function length
Instruction Block:
You are the Refactorer. Improve structure, names, and cohesion without changing behavior. Keep existing tests passing. Provide a concise rationale for each significant change.
8) Reviewer — enforce the rubric
Purpose: gatekeeper for quality.
Rubric (0–3 each; fail if any 0):
- Clarity: understandable without author present
- Correctness: behavior matches spec/plan; errors handled
- Docs: public APIs documented; tricky logic explained
- Tests: coverage of happy/edge/negative; fast and reliable
- Security: inputs validated; secrets safe; least privilege
- Performance: no obvious inefficiencies in hot paths
Instruction Block:
You are the Reviewer. Score the submission against the rubric. Provide a terse punch-list of required fixes before merge. No vague comments; cite lines/snippets.
File/Module header templates
Provenance header (place at top of generated files):
# Project: <name>
# File: <path>
# Purpose: <short description>
# Created by: codegen + human review. See /agents.md for standards.
# Notes: Keep functions small; document public APIs; ship with tests.
Python module docstring skeleton:
"""<One-line purpose>
Details:
- Responsibilities: <bullets>
- Inputs/Outputs: <summarize>
- Invariants: <what must remain true>
Examples:
>>> result = do_thing("input")
>>> assert result == "output"
"""
TypeScript/JS file header skeleton:
/**
* Purpose: <one-liner>
* Responsibilities: <bullets>
* Inputs/Outputs: <summary>
* Invariants: <list>
* See: /agents.md
*/
Function docstring/comment standards
Python (Google style):
def fetch_users(client: ApiClient, limit: int = 100) -> list[User]:
"""Return up to `limit` users from the API.
Args:
client: Authenticated API client.
limit: Max number of users to return (1..1000).
Returns:
List of users sorted by `created_at` descending.
Raises:
ApiError: On non-2xx response.
ValueError: If `limit` out of bounds.
Example:
>>> users = fetch_users(client, limit=10)
>>> assert users
"""
...
TypeScript:
/**
* Return up to `limit` users from the API.
* @param client Authenticated API client.
* @param limit Max number of users (1..1000). Default 100.
* @returns Users sorted by createdAt desc.
* @throws ApiError on non-2xx; RangeError if limit invalid.
* @example
* const users = await fetchUsers(client, 10);
*/
function fetchUsers(client: ApiClient, limit = 100): Promise<User[]> { /* ... */ }
Error handling & logging
- Raise/throw domain‑specific exceptions with context; never swallow errors silently.
- Include actionable messages: what failed, why likely, what to try.
- Prefer structured logs (JSON fields) for machine parsing.
- Add timeouts to network calls; retries with jitter for transient failures.
- Guard concurrency with clear ownership and cancellation support.
Python snippet:
try:
res = httpx.get(url, timeout=5)
res.raise_for_status()
except httpx.TimeoutException as e:
logger.warning("fetch_timeout", url=url, timeout=5, err=str(e))
raise ApiError("Timed out fetching %s" % url) from e
TypeScript snippet:
const ctrl = new AbortController();
const t = setTimeout(() => ctrl.abort(), 5000);
try {
const res = await fetch(url, { signal: ctrl.signal });
if (!res.ok) throw new ApiError(`HTTP ${res.status}`);
} finally {
clearTimeout(t);
}
Testing standards
- Structure: Arrange‑Act‑Assert; one behavior per test.
- Coverage: happy path + edge cases + one negative path.
- Speed: keep unit tests <100ms each; use markers for slow/integration.
- Stability: no reliance on external services; use fakes/mocks.
- Discoverability: tests read like examples.
Python (pytest) skeleton:
def test_parses_valid_record():
rec = parse_record("id,42")
assert rec.id == 42
@pytest.mark.parametrize("bad", ["", "id,", "id,NaN"])
def test_rejects_bad_input(bad):
with pytest.raises(ValueError):
parse_record(bad)
TypeScript (vitest/jest) skeleton:
test('parses valid record', () => {
const rec = parseRecord('id,42');
expect(rec.id).toBe(42);
});
test.each(['', 'id,', 'id,NaN'])('rejects bad input: %s', (bad) => {
expect(() => parseRecord(bad)).toThrow();
});
Style essentials (language‑agnostic)
- Naming: descriptive over terse (
retry_with_backoff>rwb). - Immutability: favor constants; avoid shared mutable state.
- Data: define explicit types/interfaces; validate at boundaries.
- IO boundaries: wrap external services behind interfaces; inject dependencies.
- Comments: tell me why, not what the next line obviously does.
- Formatting: use formatters/linters (black/ruff, prettier/eslint) and fix warnings.
Security & privacy quicklist
- Keep secrets in vault/env; never commit credentials.
- Zero‑trust inputs: validate/escape; limit memory use; protect against DoS via timeouts/limits.
- Sanitize logs (no PII/secrets). Redact tokens.
- Use parameterized queries; avoid string‑built SQL.
- Respect licenses and attributions for third‑party code.
Dependency & build hygiene
- Pin versions where reproducibility matters. Document how to update.
- Separate prod vs dev dependencies.
- Provide a one‑command setup (
make setup,uv sync,npm ci). - CI must run linters, tests, type checks, and security scans.
Definition of Done (merge gate)
- Architect plan exists and matches implementation
- Code passes rubric with no zeros
- Public APIs documented with examples
- Tests: happy + edge + negative; CI is green
- Security checklist items addressed
- Dependencies pinned (as policy dictates)
- No TODOs that block usage
PR template (drop into .github/pull_request_template.md)
### What & Why
<brief description>
### Design notes
- <assumptions>
- <tradeoffs>
### Tests
- [ ] Unit tests
- [ ] Negative/edge cases
- [ ] Added/updated fixtures
### Risk & Mitigation
- <failure modes>
- <rollout/rollback plan>
### Checklist
- [ ] Docs updated
- [ ] Security reviewed
- [ ] Performance considered
- [ ] Rubric passes locally
Prompts you can paste into codegen
Generate:
Write code per the Architect plan and /agents.md. Priorities: readability, safety, tests. Add docstrings to public APIs; short comments for non-obvious logic. Validate inputs; fail with clear errors. Include usage examples in docstrings. Small functions only.
Critique:
Review the diff against /agents.md rubric. Score each category (0–3). List concrete, line-anchored fixes. Refuse vague advice.
Refactor:
Refactor for clarity only. Keep behavior identical and tests passing. Improve naming, extract pure helpers, remove duplication.
Tests:
Create fast, deterministic unit tests that double as examples. Cover happy path, edges, and a negative case per function.
Example: before vs. after (Python)
Before
def c(d):
r = []
for x in d:
if x%2==0:
r.append(x*x)
return r
After
def squares_of_even(values: list[int]) -> list[int]:
"""Return squares of even integers from `values`.
Example:
>>> squares_of_even([1, 2, 3, 4])
[4, 16]
"""
if values is None:
raise ValueError("values must be a list of integers")
return [v * v for v in values if v % 2 == 0]
Language‑specific knobs
Python
- Tools:
ruff,black,mypy,pytest,hypothesis(optional) - Packaging:
pyproject.toml; preferuv/pip-toolsfor locking - Concurrency: prefer
asyncio/anyiowith timeouts and cancellation
TypeScript
- Tools:
eslint,prettier,tsc,vitest/jest - Types: strict mode; no
anyunless justified with comment - Runtime:
abort-signalfor timeouts; fetch wrappers
House style (tweak as you like)
- Max line length 100; wrap docstrings and comments
- One concept per module; one responsibility per function
- Avoid default‑mutable args (Python); avoid global singletons
- Prefer composition over inheritance
- Favor explicit returns and pure functions