The Prompt API

Apr 27, 2026 · devtools web hardware · Source ↗

TLDR

Chrome’s built-in Prompt API exposes Gemini Nano directly in the browser via LanguageModel, supporting text, image, and audio input with structured output.

Available in Chrome 138+ origin trial; requires Windows 10/11, macOS 13+, Linux, or ChromeOS on Chromebook Plus — no mobile support yet.
Hardware bar is real: 22 GB free storage, 4+ GB VRAM or 16 GB RAM with 4 CPU cores; audio input requires a GPU.
API surface includes LanguageModel.create(), prompt(), promptStreaming(), append(), and responseConstraint for JSON Schema or regex-constrained structured output.
Multimodal inputs cover AudioBuffer, Blob, HTMLCanvasElement, HTMLImageElement, VideoFrame, and more; output is text only.
Session context is configurable via initialPrompts, topK, temperature, expectedInputs, and expectedOutputs (English, Japanese, Spanish supported).

Consensus: the hardware requirements and Gemini Nano’s benchmark scores (46-56% MMLU) are the main practical blocker; newer quantized Gemma4 E2B/E4B models already outperform it and can be bundled in extensions today.
Real-world shipping note from a practitioner: the API works for lightweight local inference and is privacy-preserving with no user setup, but the model download UX is poor and the model itself is weak for anything non-trivial.
Cross-browser standardization concern is unresolved; Mozilla’s standards-position issue is open, and per-browser model variation creates a testing and consistency problem for developers.

@domenicd: Former API design lead shares a detailed writeup of design considerations at domenic.me/builtin-ai-api-design/.
@avaer: Shipped it as “poor person’s ollama” for low-end tasks; confirms the model download is “orders of magnitude greater” than expected and degrades first-run UX significantly.
@meander_water: Benchmarks Gemini Nano-1 at 46% MMLU and Nano-2 at 56% MMLU vs Gemma4 E4B at 69.4% MMLU, arguing extension-bundled quantized models are strictly better right now.