Utilyze (open-source, by Systalyze) measures true GPU compute utilization via hardware performance counters, exposing that nvtop and DCGM SM Active both read ~100% on workloads with 6% actual arithmetic throughput.
Key Takeaways
nvtop pins at 100% regardless of matrix multiplication size; at N=256 on an H200, true compute utilization is roughly 1%.
DCGM SM Active reports 99% on a memory-bound LLM decode-style workload where ground-truth arithmetic throughput is 6%; warps resident on SMs but waiting on HBM look identical to warps doing math.
Utilyze uses NVIDIA’s Nsight Perf SDK to cycle through hardware counters in rolling time windows, delivering near-zero overhead and continuous production-safe monitoring.
Two headline metrics: Compute SOL % and Memory SOL %, each measured against the hardware’s theoretical peak; the higher value identifies the binding constraint.
Attainable SOL % models the realistic ceiling for a specific deployment (model architecture, parallelism, batch size), separating recoverable optimization budget from structural physics.
Cloud providers and hardware vendors surface the same misleading utilization number on their dashboards; as the Systalyze CEO notes, incentives to correct the misimpression are “complicated.”
Hacker News Comment Review
Several practitioners default to power draw as a utilization proxy, which sidesteps the measurement problem but cannot distinguish compute-bound from memory-bound bottlenecks or quantify optimization headroom.
Commenters flagged that Utilyze v0.1.3 lacks nvidia-smi staples (temperature, fan speed, per-process memory), which blocks full drop-in replacement in ops workflows that monitor thermal and memory state alongside throughput.
A researcher running H100 clusters with vLLM raised an open question about whether Attainable SOL % adapts when users change inference hyperparameters, pointing to a real gap for dynamic research environments.
Notable Comments
@xtimecrystal: v0.1.3 missing memory usage, processes, temperature, and fan speed blocks it from replacing nvidia-smi as a single-pane monitor.
@Cynddl: asks how Attainable SOL % handles user-tweaked hyperparameters in vLLM research clusters, a concrete edge case for the telemetry model.