The Prompt API

· devtools web hardware · Source ↗

TLDR

  • Chrome’s built-in Prompt API exposes Gemini Nano directly in the browser via LanguageModel, supporting text, image, and audio input with structured output.

Key Takeaways

  • Available in Chrome 138+ origin trial; requires Windows 10/11, macOS 13+, Linux, or ChromeOS on Chromebook Plus — no mobile support yet.
  • Hardware bar is real: 22 GB free storage, 4+ GB VRAM or 16 GB RAM with 4 CPU cores; audio input requires a GPU.
  • API surface includes LanguageModel.create(), prompt(), promptStreaming(), append(), and responseConstraint for JSON Schema or regex-constrained structured output.
  • Multimodal inputs cover AudioBuffer, Blob, HTMLCanvasElement, HTMLImageElement, VideoFrame, and more; output is text only.
  • Session context is configurable via initialPrompts, topK, temperature, expectedInputs, and expectedOutputs (English, Japanese, Spanish supported).

Hacker News Comment Review

  • Consensus: the hardware requirements and Gemini Nano’s benchmark scores (46-56% MMLU) are the main practical blocker; newer quantized Gemma4 E2B/E4B models already outperform it and can be bundled in extensions today.
  • Real-world shipping note from a practitioner: the API works for lightweight local inference and is privacy-preserving with no user setup, but the model download UX is poor and the model itself is weak for anything non-trivial.
  • Cross-browser standardization concern is unresolved; Mozilla’s standards-position issue is open, and per-browser model variation creates a testing and consistency problem for developers.

Notable Comments

  • @domenicd: Former API design lead shares a detailed writeup of design considerations at domenic.me/builtin-ai-api-design/.
  • @avaer: Shipped it as “poor person’s ollama” for low-end tasks; confirms the model download is “orders of magnitude greater” than expected and degrades first-run UX significantly.
  • @meander_water: Benchmarks Gemini Nano-1 at 46% MMLU and Nano-2 at 56% MMLU vs Gemma4 E4B at 69.4% MMLU, arguing extension-bundled quantized models are strictly better right now.

Original | Discuss on HN