Show HN: Utilyze – an open source GPU monitoring tool more accurate than nvtop

· ai · Source ↗

TLDR

  • Utilyze (open-source, by Systalyze) measures true GPU compute utilization via hardware performance counters, exposing that nvtop and DCGM SM Active both read ~100% on workloads with 6% actual arithmetic throughput.

Key Takeaways

  • nvtop pins at 100% regardless of matrix multiplication size; at N=256 on an H200, true compute utilization is roughly 1%.
  • DCGM SM Active reports 99% on a memory-bound LLM decode-style workload where ground-truth arithmetic throughput is 6%; warps resident on SMs but waiting on HBM look identical to warps doing math.
  • Utilyze uses NVIDIA’s Nsight Perf SDK to cycle through hardware counters in rolling time windows, delivering near-zero overhead and continuous production-safe monitoring.
  • Two headline metrics: Compute SOL % and Memory SOL %, each measured against the hardware’s theoretical peak; the higher value identifies the binding constraint.
  • Attainable SOL % models the realistic ceiling for a specific deployment (model architecture, parallelism, batch size), separating recoverable optimization budget from structural physics.
  • Cloud providers and hardware vendors surface the same misleading utilization number on their dashboards; as the Systalyze CEO notes, incentives to correct the misimpression are “complicated.”

Hacker News Comment Review

  • Several practitioners default to power draw as a utilization proxy, which sidesteps the measurement problem but cannot distinguish compute-bound from memory-bound bottlenecks or quantify optimization headroom.
  • Commenters flagged that Utilyze v0.1.3 lacks nvidia-smi staples (temperature, fan speed, per-process memory), which blocks full drop-in replacement in ops workflows that monitor thermal and memory state alongside throughput.
  • A researcher running H100 clusters with vLLM raised an open question about whether Attainable SOL % adapts when users change inference hyperparameters, pointing to a real gap for dynamic research environments.

Notable Comments

  • @xtimecrystal: v0.1.3 missing memory usage, processes, temperature, and fan speed blocks it from replacing nvidia-smi as a single-pane monitor.
  • @Cynddl: asks how Attainable SOL % handles user-tweaked hyperparameters in vLLM research clusters, a concrete edge case for the telemetry model.

Original | Discuss on HN