Affirm Retooled its Engineering Organization for Agentic Software Development in One Week

Apr 24, 2026 · ai-agents ai coding · Source ↗

TLDR

Affirm paused product delivery for one week in February 2026, ran 800+ engineers through a mandatory agentic workflow using Claude Code, and reached 60% agent-assisted PRs.

The default toolchain was Claude Code with a single workflow: Plan, Review, Execute, Verify, Review, Deliver — one task equals one agent session equals one PR.
92% of the engineering org submitted at least one agentic PR by week end; token spend landed at ~$140k of a $200k (~$250/engineer) budget.
Code review was the top friction point (40% of survey respondents cited it unprompted); CI e2e suites ran 100+ minutes, incompatible with agent change-validate-fix loops.
MCP integrations created a security bottleneck — CLIs proved more reliable; centralized governance couldn’t keep up with the volume of integration requests.
Agents generating both implementation and tests in one session can produce mutually confirming errors; Affirm is piloting cross-model independent validation against acceptance criteria.

The dominant reaction is skepticism toward top-down mandates in a fintech context: commenters argue that a 12-year-old monorepo with bloated test suites, unstable CI, and financial liability is exactly the wrong environment for rapid agentic adoption, and that each listed bottleneck is already a major risk multiplier.
Several commenters treat the 60% agent-assisted PR metric as a lagging indicator rather than a safety signal, predicting that misunderstood requirements will ship undetected for months before surfacing as a breach or data incident.
A recurring thread questions whether the week’s metrics — PR volume, adoption rate — can capture quality regression, or whether they mainly serve to demonstrate AI ROI to executives.

@dotdi: points out that Affirm’s own post lists bloated test suites, manual review, unstable CI, and slow deploy infra as existing problems, then argues each is already a hard blocker for agents before combining them.
@ookblah: argues that measuring agentic productivity over one week on metrics nobody fully understands is “someone getting paid to bullshit some metrics to executives” and predicts a visible failure within 12 months.