Human Typing Habits and Token Counts
Ordinary typing habits—typos, shorthand, filler words, pasted UUIDs—change token counts without changing intent, and tokenizers bill by pattern regardless of recovered meaning.
What Matters
-
template→ 1 token;tempalte→ 3 tokens (OpenAI). Same word, 3× cost from a single transposition. -
assistant→ 1 token;assitant→ 2 (OpenAI), 3 (Claude). Claude consistently tokenizes misspellings more expensively. -
Shorthand backfires:
pls→ 2 tokens (Claude),thx→ 2,w/o→ 3 (Claude) vs. 1 token each for the full dictionary words. -
A UUID like
019d6ce9-7cfe-753a-b6d6-df719510c9e3costs 24 tokens (OpenAI) or 26 (Claude); an RFC 3339 timestamp costs 16–17 tokens. -
Expressive punctuation leaks:
Yes!!→ 2 tokens,yesss→ 3,reeeally→ 3—tone markers that rarely help the task. -
Suffixes fragment unpredictably:
describe→ 1,describer→ 2,describers→ 3; a tiny morpheme can double or triple the split. - Boundary whitespace (leading/trailing spaces) inflates counts; normal internal spacing is generally safe.