Show HN: How LLMs Work – Interactive visual guide based on Karpathy's lecture

Apr 24, 2026 · web ai · Source ↗

TLDR

Step-by-step interactive walkthrough of LLM construction, from Common Crawl filtering through BPE tokenization, Transformer training, and RLHF post-training, based on Karpathy’s lecture.

FineWeb pipeline: 2.7B Common Crawl pages reduced to 44TB / 15T tokens via URL blocklists, MinHash deduplication, language filtering, and PII removal.
BPE builds GPT-4’s 100,277-token vocabulary by starting from 256 byte symbols and iteratively merging the most frequent adjacent pairs.
Training cost collapsed: GPT-2 quality cost ~$40K in 2019, equivalent now runs ~$100. Llama 3 uses 405B parameters on 15T tokens.
Base model is a stochastic internet autocomplete engine; SFT on labeled conversations and RLHF preference ranking convert it into a chat assistant.
Inference is autoregressive and stochastic; temperature 0.7-1.0 is the practical sweet spot, and tool use works by emitting special tokens that pause generation and inject results into the context window.

Dominant thread concern is unproofread AI-generated copy: the claim that 44TB “roughly fits on a single hard drive” is wrong by at least 1.5x the current retail maximum (~32TB), eroding trust in the technical content.
The BPE diagram drew a specific technical objection: BPE is purely additive and never removes the original 256 byte-level tokens, so showing replacement is a meaningful misconception.
Commenters broadly redirected to Jay Alammar’s “The Illustrated GPT-2” as the established human-authored reference for the same material.

@vova_hn2: BPE visualization misleads by implying old tokens are discarded; the process only adds merges, always retaining all 256 byte tokens.
@gushogg-blake: Guide skips embedding depth entirely: how the input side of the network represents N context tokens, and how embeddings handle tokens with context-dependent meanings.