Printing Blogs

· databases books · Source ↗

TLDR

  • Builder prints entire blogs as physical booklets using a Knapsack problem + Google PageRank API pipeline to select essays within a word-count budget.

Key Takeaways

  • Selection is automated: Google Search API rank scores posts assuming Zipf-distributed quality; author-recommended posts get a baseline point to anchor the offset.
  • Word count is the Knapsack cost; a 20 euro / 150-page budget targets ~75k words per blog, roughly 41 Paul Graham essays.
  • Pandoc renders body text; Typst generates cover pages; headless browser converts interactive HTML plots to static PNGs for print.
  • LLM clustering was tried and abandoned as poor; manual chapter grouping was faster and better for all five blogs processed.
  • Blogs covered: paulgraham.com, marginalrevolution.com, maxhodak.com, guzey.com, fi-le.net – each required custom formatting fixes.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN