Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

· ai ai-agents coding · Source ↗

TLDR

  • Semble is a CPU-only code search library for agents combining Model2Vec embeddings and BM25 with RRF fusion, indexing repos in ~250ms and querying in ~1.5ms.

Key Takeaways

  • Achieves NDCG@10 of 0.854, 99% of CodeRankEmbed Hybrid quality, while indexing 218x faster and querying 11x faster.
  • Returns only relevant chunks, using 98% fewer tokens than grep+read; reaches 94% recall at 2k tokens vs 100k context needed by grep+read.
  • Runs entirely on CPU with no API keys, GPU, or external services; installable via pip or uv, works as MCP server or bash tool via AGENTS.md.
  • Ranking uses adaptive lexical/semantic weighting, definition boosts, identifier stemming, file coherence scoring, and noise penalties for test/legacy files.
  • Supports Claude Code, Cursor, Codex, OpenCode via MCP or bash integration; semble savings tracks token savings over time.

Hacker News Comment Review

  • The core open question is whether agents actually trust Semble’s results in practice: models heavily RL’d on grep may ignore non-grep outputs and retry, erasing token savings entirely.
  • Benchmarks measure only retrieval accuracy (NDCG@10), not end-to-end agent task performance; authors acknowledge this gap and say agent benchmarks are on the roadmap.
  • The “98% fewer tokens than grep” framing drew skepticism since grep alone returns no context tokens; the valid baseline is grep+readfile, which the authors confirmed is the intended comparison.

Notable Comments

  • @jerezzprime: raises evidence from RTK/LSP experiments that agents retry or re-read when they distrust non-grep tool results, nullifying token savings.

Original | Discuss on HN