DuckDB’s fts extension brings BM25-ranked full-text search to any data source with a two-line install, useful for exploratory work on email archives or large text corpora.
Key Takeaways
Install is trivial: INSTALL fts; LOAD fts; then PRAGMA create_fts_index('table', 'id', 'col1', 'col2') – no separate service needed.
BM25 tuning exposed directly in SQL: k₁ controls term-frequency weight, b controls document-length normalization, conjunctive := 1 requires all terms.
Stemming via Snowball (same as Postgres/Elasticsearch); debug unexpected stem misses with the Python snowballstemmer library before blaming the index.
Missing vs. Postgres: no ts_headline match highlighting, no phrase queries, no synonym dictionaries – workarounds require piping to grep or tmux.
Author’s verdict: good enough for exploratory use; easy to graduate to Postgres or Elasticsearch by dumping DuckDB if richer FTS is needed.
Hacker News Comment Review
Commenters note DuckDB’s ability to attach and join across heterogeneous data sources (Parquet, JSON, existing DBs) as the broader draw, with FTS as one capability on top.
A practical use case raised: publishing browsable, searchable mailbox archives publicly – current open-source tooling is thin and preprocessing friction is real.
ClickHouse Local surfaced as a less-marketed but functionally similar alternative worth benchmarking against DuckDB for local analytical workloads.
Notable Comments
@jiehong: asks whether anyone has compared DuckDB and ClickHouse Local directly, flagging ClickHouse as undermarketed but comparable.
@rahimnathwani: built a live demo publishing a small email archive via DuckDB and Claude; notes search quality and preprocessing overhead as unsolved problems.