Using "underdrawings" for accurate text and numbers

May 4, 2026 · ai · Source ↗

TLDR

Generate a precise SVG layout of text/numbers, then pass it as an image input to Gemini or ChatGPT-Images to paint over, yielding accurate results neither model achieves alone.

Both Gemini 3.0 Pro and ChatGPT-Images-2 fail on complex numbered layouts (e.g. 50-stone spiral board) without the underdrawing method.
Layer 1 uses SVG/HTML for deterministic math and positioning; Layer 2 uses image gen for visual style, treating the SVG export as a ControlNet-style reference image.
Workflow is fully automatable: Claude Code or Codex can generate the SVG, pass it to Gemini Pro, and apply the style prompt end-to-end.
Results are not perfect every time; the method improves reliability but does not guarantee pixel-accurate output on every run.

Veteran SD users note this is essentially img2img with a code-generated base, a technique common since early Stable Diffusion and ControlNet days, making the “discovery” framing feel overstated.
Commenters debate whether text/number failures reflect fundamental LLM limitations or just unsolved engineering, citing rapid progress on tasks once deemed impossible (character counting, phonetics).
A mirrored idea surfaced: use image gen first to produce a photorealistic reference, then trace it into SVG, inverting the pipeline for tasks like complex SVG illustration.

@dllu: Proposes the inverse pipeline: generate a photorealistic image first, then have an LLM trace it into SVG for tasks like “pelican riding a bike.”
@IdiotSavage: Notes that even with precise prompts, style details like “low-angle” and “artisanal” are quietly ignored by the model.