SSE-based resumable, cancellable, multi-device LLM token streaming is technically possible but costly, requiring per-token DB writes, cancel markers, and polling workarounds.
Key Takeaways
Resumable streams via Last-Event-ID require storing every token in a shared DB, since stateless replicas may route reconnects to a different server.
Per-token DB writes create heavy write amplification: each token event carries ~125 chars of metadata for a few chars of text delta, and all tokens are discarded after the full response lands.
Cancellations need a separate POST /cancel/{response_id} endpoint writing a cancel marker to shared state; the LLM inference process polls for that marker between tokens.
Multi-device support splits into two problems: serving stored tokens to late joiners (solved by DB) and notifying device B of new prompts from device A (not solved by SSE alone, requires polling or long-polling).
The author works at Ably and argues a pub/sub transport decouples connection lifetime from agent lifecycle, handles rewind/history, and compacts token deltas automatically.