Formatting a 25M-line codebase overnight

· coding · Source ↗

TLDR

  • Stripe’s Developer Productivity team rolled out rubyfmt, a Rust-based zero-config Ruby autoformatter, across a 25-million-line monorepo in a single overnight run.

Key Takeaways

  • rubyfmt is Rust-based and zero-config, designed for speed at the scale of the world’s largest Ruby codebase.
  • The team chose a Saturday cutover to minimize merge conflicts rather than an incremental rollout.
  • Their approach prioritized the hardest syntax cases first, reasoning that solving edge cases covers the long tail.
  • High test suite confidence was required before committing a diff too large for GitHub to render.

Hacker News Comment Review

  • Commenters split on the all-at-once vs. incremental strategy: incremental rollout via open-PR exclusion scripts can reformat ~95% of files safely, while a single-day cutover enables clean blame.ignoreRevsFile usage but forces conflict resolution.
  • The “overnight” framing drew skepticism: one commenter benchmarked clang-format on 21M lines of Chromium in under 6 minutes on a 2014 Xeon, suggesting the timeline reflects process overhead, not raw formatter speed.
  • A core formatter reliability tip surfaced: the Dart formatter walks unformatted and formatted output in parallel, comparing non-whitespace characters to catch any semantic corruption – commenters suggested AST diffing as a stronger check.

Notable Comments

  • @hobofan: describes an incremental PR-exclusion script strategy that safely reformatted 95% of files before the final sweep.
  • @munificent: explains Dart formatter’s whitespace-skipping sanity check as a model for catching silent reformatter bugs.

Original | Discuss on HN