A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all

· ai ai-agents coding · Source ↗

TLDR

  • Internal eval (AuggieBench) finds a well-structured AGENTS.md delivers gains equivalent to a Haiku-to-Opus upgrade; a poorly written one degrades output below the no-docs baseline.

Key Takeaways

  • 100-150 line AGENTS.md files with a few focused reference docs produced 10-15% cross-metric improvements in mid-size modules (~100 core files); longer files reversed those gains.
  • Procedural numbered workflows were the strongest single pattern: missing-wiring PRs dropped from 40% to 10%, correctness up 25%, completeness up 20% on a six-step deploy workflow.
  • Decision tables (e.g. React Query vs Zustand) resolved ambiguity before the agent wrote a single line; PRs in those areas scored 25% higher on best_practices adherence.
  • Lists of “don’ts” without paired “do” alternatives cause overexploration: agent reads migration scripts, auth middleware, and version changelogs even on unrelated tasks.
  • AGENTS.md is auto-discovered 100% of the time; orphan docs in _docs/ folders get opened in under 10% of sessions – if it must be seen, it needs to live in or be referenced from AGENTS.md.

Hacker News Comment Review

  • Broad consensus that AGENTS.md is only the entry point: surrounding harness artifacts (skills, shell commands, auto-generated memories, spec sprawl) independently shape context and can overwhelm even a well-written file.
  • Counterpoint from a prominent OSS developer: projects with comprehensive tests and clear existing docs often need no AGENTS.md at all – the codebase itself orients the agent.
  • Practical reliability concern: at least one commenter found that lower-level (nested) AGENTS.md files are occasionally skipped by VS Code Copilot, weakening the hierarchy strategy the article recommends.

Notable Comments

  • @try-working: argues the full harness (skills, context, memory, decision history) should live in the repo for full portability across IDEs and models – “don’t let OpenAI or Anthropic own your work.”
  • @thegagne: built ktext.dev as a structured context format that can generate AGENTS.md and be validated and scored, directly targeting the quality-measurement gap this study highlights.

Original | Discuss on HN