AGENTS.md File Optimization

whafa · December 12, 2025, 9:13pm

Do you use an AGENTS.md file? What do you have in yours? I asked ChatGPT to create one for me, and it made a 14k document with definitions for like 8 different agent roles (from Architect to Scribe). I have asked for a new one a few times since, and have never gotten anything close to this. Was it just my lucky day? Is this file, in fact, any good at all? I think all I asked for was to prioritize simplicity and understandability over technical savvy.

Here’s the full AGENTS.md file I use, unchanged from what Chat gave me. Please critique and edit as needed, especially if it’s no good. Thank you!

agents.md — Codegen guardrails for human‑readable, well‑documented software

Goal: steer code‑generation tools so outputs are readable, maintainable, well‑commented, test‑backed, and safe. This file defines roles, prompts, style rules, and checklists you can point your code agents at.

How to use this file

Treat each Role below as a specialized agent. You can run them sequentially on the same diff/file.
Copy the Instruction Blocks into your codegen system prompt or per‑file header comments.
Apply the Rubrics & Checklists as automated critics (pass/fail with reasons) after each generation.
Keep this file in your repo root. Link to it from your contributing guide and PR template.

Global constraints (apply to every agent)

Audience: capable developers who didn’t write this code.
Prime directive: prefer clarity over cleverness; prioritize explicitness; minimize magic.
Complexity budget: small functions, cohesive modules, no hidden globals, no implicit side effects.
Documentation: every public API has a docstring; tricky private logic gets a comment.
Testing: new code ships with tests that demonstrate usage and cover edge cases.
Observability: errors are handled; logs are structured and actionable.
Security: use least privilege; validate inputs; avoid secret sprawl.
Reproducibility: pinned dependencies; deterministic behavior where feasible.

Agent roles

1) Architect — plan before you code

Purpose: produce a brief plan: responsibilities, data shapes, interfaces, and risks.
Deliverables:

One‑screen outline: modules, functions, key types
Sequence sketch for main flow
List of failure modes + mitigation
Test plan summary

Instruction Block:

You are the Architect. Output: (1) brief module/function outline, (2) key data structures/types, (3) failure modes with handling strategy, (4) initial test plan. Be concise, no hidden steps. Prefer simple, composable pieces.

2) Code Writer — implement the plan

Purpose: write readable, idiomatic code from the plan.
Rules:

Start files with a provenance header (template below).
Small functions; explicit names; no nesting beyond 2 levels.
Avoid excessive cleverness; choose clarity.
Add docstrings/comments where they add real value.
Include usage examples in docstrings for public APIs.

Instruction Block:

You are the Code Writer. Implement according to the Architect plan and Global constraints. Write clean, idiomatic code. Add docstrings to public APIs and targeted comments for tricky logic. Include minimal examples in docstrings. Keep functions small and single-purpose. Avoid unnecessary abstractions. Ensure inputs are validated and errors are meaningful.

3) Test Engineer — prove it works

Purpose: create tests that double as documentation.
Rules:

Cover happy path, edge cases, and one negative path per function.
Use fixtures/fakes over global state. Prefer property‑based tests where helpful.
Name tests as behavior spec (test_sorts_stable_when_equal_keys).
Include a fast smoke test.

Instruction Block:

You are the Test Engineer. Write tests that (a) demonstrate usage, (b) cover edge cases, (c) are fast and deterministic. Prefer simple fakes/fixtures. Include a smoke test and at least one negative test. If external IO is required, isolate behind an interface and mock that interface.

4) Docstring Scribe — explain it to future humans

Purpose: improve docstrings and in‑code comments.
Rules:

For public APIs: one‑liner, arguments, returns, raises, examples.
For complex logic: preface with a short why comment (context/intuition), not just what.
Keep line length sensible; wrap at 88–100 cols.

Instruction Block:

You are the Docstring Scribe. Enhance docstrings and comments for clarity and future maintenance. Add short rationale comments before non-obvious logic. Update examples so they run. Do not restate the code; explain intent, invariants, and tradeoffs.

5) Security Sentry — reduce foot‑guns

Purpose: threat model and patch obvious holes.
Checklist:

Input validation & output encoding
Secrets via env/secret store; no hard‑coded tokens
Principle of least privilege (filesystem, network, DB)
Safe defaults; timeouts; retries with backoff; circuit breakers
Dependency risks noted (and pinned where appropriate)

Instruction Block:

You are the Security Sentry. Identify trust boundaries, validate inputs, remove secret sprawl, enforce least privilege, add safe timeouts/retries. Point out insecure defaults and propose safe ones. Output a diff or bullet list of fixes.

6) Performance Tuner — keep it snappy

Purpose: spot obvious inefficiencies and add basic profiling hooks.
Checklist:

Hot paths: eliminate quadratic work & needless copies
Streaming/iterators where large data
Batching for IO
Caching only with clear invalidation
Provide a micro‑benchmark if relevant

Instruction Block:

You are the Performance Tuner. Identify big-O regressions, expensive allocations, and IO chatter. Suggest simple wins first. Add optional profiling hooks (guarded). Avoid premature micro-optimizations.

7) Refactorer — improve without changing behavior

Purpose: rename, extract, and simplify while preserving tests.
Rules:

No semantic changes; keep API stable unless flagged
Extract pure functions; remove dead code
Improve naming and cohesion; shrink function length

Instruction Block:

You are the Refactorer. Improve structure, names, and cohesion without changing behavior. Keep existing tests passing. Provide a concise rationale for each significant change.

8) Reviewer — enforce the rubric

Purpose: gatekeeper for quality.

Rubric (0–3 each; fail if any 0):

Clarity: understandable without author present
Correctness: behavior matches spec/plan; errors handled
Docs: public APIs documented; tricky logic explained
Tests: coverage of happy/edge/negative; fast and reliable
Security: inputs validated; secrets safe; least privilege
Performance: no obvious inefficiencies in hot paths

Instruction Block:

You are the Reviewer. Score the submission against the rubric. Provide a terse punch-list of required fixes before merge. No vague comments; cite lines/snippets.

File/Module header templates

Provenance header (place at top of generated files):

# Project: <name>
# File: <path>
# Purpose: <short description>
# Created by: codegen + human review. See /agents.md for standards.
# Notes: Keep functions small; document public APIs; ship with tests.

Python module docstring skeleton:

"""<One-line purpose>

Details:
- Responsibilities: <bullets>
- Inputs/Outputs: <summarize>
- Invariants: <what must remain true>

Examples:
    >>> result = do_thing("input")
    >>> assert result == "output"
"""

TypeScript/JS file header skeleton:

/**
 * Purpose: <one-liner>
 * Responsibilities: <bullets>
 * Inputs/Outputs: <summary>
 * Invariants: <list>
 * See: /agents.md
 */

Function docstring/comment standards

Python (Google style):

def fetch_users(client: ApiClient, limit: int = 100) -> list[User]:
    """Return up to `limit` users from the API.

    Args:
        client: Authenticated API client.
        limit: Max number of users to return (1..1000).

    Returns:
        List of users sorted by `created_at` descending.

    Raises:
        ApiError: On non-2xx response.
        ValueError: If `limit` out of bounds.

    Example:
        >>> users = fetch_users(client, limit=10)
        >>> assert users
    """
    ...

TypeScript:

/**
 * Return up to `limit` users from the API.
 * @param client Authenticated API client.
 * @param limit Max number of users (1..1000). Default 100.
 * @returns Users sorted by createdAt desc.
 * @throws ApiError on non-2xx; RangeError if limit invalid.
 * @example
 *   const users = await fetchUsers(client, 10);
 */
function fetchUsers(client: ApiClient, limit = 100): Promise<User[]> { /* ... */ }

Error handling & logging

Raise/throw domain‑specific exceptions with context; never swallow errors silently.
Include actionable messages: what failed, why likely, what to try.
Prefer structured logs (JSON fields) for machine parsing.
Add timeouts to network calls; retries with jitter for transient failures.
Guard concurrency with clear ownership and cancellation support.

Python snippet:

try:
    res = httpx.get(url, timeout=5)
    res.raise_for_status()
except httpx.TimeoutException as e:
    logger.warning("fetch_timeout", url=url, timeout=5, err=str(e))
    raise ApiError("Timed out fetching %s" % url) from e

TypeScript snippet:

const ctrl = new AbortController();
const t = setTimeout(() => ctrl.abort(), 5000);
try {
  const res = await fetch(url, { signal: ctrl.signal });
  if (!res.ok) throw new ApiError(`HTTP ${res.status}`);
} finally {
  clearTimeout(t);
}

Testing standards

Structure: Arrange‑Act‑Assert; one behavior per test.
Coverage: happy path + edge cases + one negative path.
Speed: keep unit tests <100ms each; use markers for slow/integration.
Stability: no reliance on external services; use fakes/mocks.
Discoverability: tests read like examples.

Python (pytest) skeleton:

def test_parses_valid_record():
    rec = parse_record("id,42")
    assert rec.id == 42

@pytest.mark.parametrize("bad", ["", "id,", "id,NaN"])
def test_rejects_bad_input(bad):
    with pytest.raises(ValueError):
        parse_record(bad)

TypeScript (vitest/jest) skeleton:

test('parses valid record', () => {
  const rec = parseRecord('id,42');
  expect(rec.id).toBe(42);
});

test.each(['', 'id,', 'id,NaN'])('rejects bad input: %s', (bad) => {
  expect(() => parseRecord(bad)).toThrow();
});

Style essentials (language‑agnostic)

Naming: descriptive over terse (retry_with_backoff > rwb).
Immutability: favor constants; avoid shared mutable state.
Data: define explicit types/interfaces; validate at boundaries.
IO boundaries: wrap external services behind interfaces; inject dependencies.
Comments: tell me why, not what the next line obviously does.
Formatting: use formatters/linters (black/ruff, prettier/eslint) and fix warnings.

Security & privacy quicklist

Keep secrets in vault/env; never commit credentials.
Zero‑trust inputs: validate/escape; limit memory use; protect against DoS via timeouts/limits.
Sanitize logs (no PII/secrets). Redact tokens.
Use parameterized queries; avoid string‑built SQL.
Respect licenses and attributions for third‑party code.

Dependency & build hygiene

Pin versions where reproducibility matters. Document how to update.
Separate prod vs dev dependencies.
Provide a one‑command setup (make setup, uv sync, npm ci).
CI must run linters, tests, type checks, and security scans.

Definition of Done (merge gate)

Architect plan exists and matches implementation
Code passes rubric with no zeros
Public APIs documented with examples
Tests: happy + edge + negative; CI is green
Security checklist items addressed
Dependencies pinned (as policy dictates)
No TODOs that block usage

PR template (drop into `.github/pull_request_template.md`)

### What & Why
<brief description>

### Design notes
- <assumptions>
- <tradeoffs>

### Tests
- [ ] Unit tests
- [ ] Negative/edge cases
- [ ] Added/updated fixtures

### Risk & Mitigation
- <failure modes>
- <rollout/rollback plan>

### Checklist
- [ ] Docs updated
- [ ] Security reviewed
- [ ] Performance considered
- [ ] Rubric passes locally

Prompts you can paste into codegen

Generate:

Write code per the Architect plan and /agents.md. Priorities: readability, safety, tests. Add docstrings to public APIs; short comments for non-obvious logic. Validate inputs; fail with clear errors. Include usage examples in docstrings. Small functions only.

Critique:

Review the diff against /agents.md rubric. Score each category (0–3). List concrete, line-anchored fixes. Refuse vague advice.

Refactor:

Refactor for clarity only. Keep behavior identical and tests passing. Improve naming, extract pure helpers, remove duplication.

Tests:

Create fast, deterministic unit tests that double as examples. Cover happy path, edges, and a negative case per function.

Example: before vs. after (Python)

Before

def c(d):
    r = []
    for x in d:
        if x%2==0:
            r.append(x*x)
    return r

After

def squares_of_even(values: list[int]) -> list[int]:
    """Return squares of even integers from `values`.

    Example:
        >>> squares_of_even([1, 2, 3, 4])
        [4, 16]
    """
    if values is None:
        raise ValueError("values must be a list of integers")
    return [v * v for v in values if v % 2 == 0]

Language‑specific knobs

Python

Tools: ruff, black, mypy, pytest, hypothesis (optional)
Packaging: pyproject.toml; prefer uv/pip-tools for locking
Concurrency: prefer asyncio/anyio with timeouts and cancellation

TypeScript

Tools: eslint, prettier, tsc, vitest/jest
Types: strict mode; no any unless justified with comment
Runtime: abort-signal for timeouts; fetch wrappers

House style (tweak as you like)

Max line length 100; wrap docstrings and comments
One concept per module; one responsibility per function
Avoid default‑mutable args (Python); avoid global singletons
Prefer composition over inheritance
Favor explicit returns and pure functions

Topic		Replies	Views
Help optimising agent instructions - text analysis and Word/Excel output Prompting chatgpt	6	716	December 28, 2025
Repo file-based task management in Codex - example solution Coding with ChatGPT codex	0	962	June 4, 2025
How to increase reliability? Let's compile best practices API hallucinations , agents , agents-sdk	4	561	September 1, 2025
Codex not processing root level Agent.md Coding with ChatGPT codex , bug	8	1829	June 19, 2025
Help making a prompt to summarize JSON in specific template Prompting chatgpt , api	1	18161	August 6, 2023

AGENTS.md File Optimization

agents.md — Codegen guardrails for human‑readable, well‑documented software

How to use this file

Global constraints (apply to every agent)

Agent roles

1) Architect — plan before you code

2) Code Writer — implement the plan

3) Test Engineer — prove it works

4) Docstring Scribe — explain it to future humans

5) Security Sentry — reduce foot‑guns

6) Performance Tuner — keep it snappy

7) Refactorer — improve without changing behavior

8) Reviewer — enforce the rubric

File/Module header templates

Function docstring/comment standards

Error handling & logging

Testing standards

Style essentials (language‑agnostic)

Security & privacy quicklist

Dependency & build hygiene

Definition of Done (merge gate)

PR template (drop into .github/pull_request_template.md)

Prompts you can paste into codegen

Example: before vs. after (Python)

Language‑specific knobs

House style (tweak as you like)

Related topics

PR template (drop into `.github/pull_request_template.md`)