Internal eval (AuggieBench) finds a well-structured AGENTS.md delivers gains equivalent to a Haiku-to-Opus upgrade; a poorly written one degrades output below the no-docs baseline.
Key Takeaways
100-150 line AGENTS.md files with a few focused reference docs produced 10-15% cross-metric improvements in mid-size modules (~100 core files); longer files reversed those gains.
Procedural numbered workflows were the strongest single pattern: missing-wiring PRs dropped from 40% to 10%, correctness up 25%, completeness up 20% on a six-step deploy workflow.
Decision tables (e.g. React Query vs Zustand) resolved ambiguity before the agent wrote a single line; PRs in those areas scored 25% higher on best_practices adherence.
Lists of “don’ts” without paired “do” alternatives cause overexploration: agent reads migration scripts, auth middleware, and version changelogs even on unrelated tasks.
AGENTS.md is auto-discovered 100% of the time; orphan docs in _docs/ folders get opened in under 10% of sessions – if it must be seen, it needs to live in or be referenced from AGENTS.md.
Hacker News Comment Review
Broad consensus that AGENTS.md is only the entry point: surrounding harness artifacts (skills, shell commands, auto-generated memories, spec sprawl) independently shape context and can overwhelm even a well-written file.
Counterpoint from a prominent OSS developer: projects with comprehensive tests and clear existing docs often need no AGENTS.md at all – the codebase itself orients the agent.
Practical reliability concern: at least one commenter found that lower-level (nested) AGENTS.md files are occasionally skipped by VS Code Copilot, weakening the hierarchy strategy the article recommends.
Notable Comments
@try-working: argues the full harness (skills, context, memory, decision history) should live in the repo for full portability across IDEs and models – “don’t let OpenAI or Anthropic own your work.”
@thegagne: built ktext.dev as a structured context format that can generate AGENTS.md and be validated and scored, directly targeting the quality-measurement gap this study highlights.