npm library replacing UUIDs with curated 4096-word IDs: ~14 tokens vs ~23 for UUID v4, using only single BPE tokens on o200k_base.
Key Takeaways
Every word in the wordlist is exactly 1 BPE token on o200k_base (GPT-4o, GPT-4.1, o1, o3); hyphens add ~1 token per 6 words.
Default 8-word ID gives ~96 bits entropy, safe past 300 trillion items before 50% collision probability.
createAliasMap lets you swap existing UUIDs to short word aliases before LLM context and restore after, with no schema changes.
idAgent.from() generates deterministic IDs via HMAC-SHA256, enabling stable mappings from emails or other stable inputs.
detectDuplicates and validate are pure utility functions; all public APIs are zod-validated and throw descriptive errors on bad input.
Hacker News Comment Review
Commenters question long-term viability: tokenization schemes change across model generations, so the wordlist curation advantage may erode as new tokenizers ship.
A practical counter-pattern was raised: just use incremental integers (1, 2, 3) as local IDs inside the context window and map to UUIDs externally, avoiding word-collision and prompt-injection risks entirely.
Prompt injection via word-based IDs is a noted risk; the author points to validate() as a harness check, though no benchmark on hallucination rate vs UUID exists yet.
Notable Comments
@simedw: Suggests createAliasMap could be made stateless with a deterministic UUID-to-words mapping, eliminating the need to track alias state.
@Tiberium: Flags conceptual overlap with humanhash, worth evaluating prior art.
@Falimonda: Calls for cross-model benchmarks on hallucination rate and token usage comparing UUID vs id-agent IDs.