Using "underdrawings" for accurate text and numbers

· ai · Source ↗

TLDR

  • Generate a precise SVG layout of text/numbers, then pass it as an image input to Gemini or ChatGPT-Images to paint over, yielding accurate results neither model achieves alone.

Key Takeaways

  • Both Gemini 3.0 Pro and ChatGPT-Images-2 fail on complex numbered layouts (e.g. 50-stone spiral board) without the underdrawing method.
  • Layer 1 uses SVG/HTML for deterministic math and positioning; Layer 2 uses image gen for visual style, treating the SVG export as a ControlNet-style reference image.
  • Workflow is fully automatable: Claude Code or Codex can generate the SVG, pass it to Gemini Pro, and apply the style prompt end-to-end.
  • Results are not perfect every time; the method improves reliability but does not guarantee pixel-accurate output on every run.

Hacker News Comment Review

  • Veteran SD users note this is essentially img2img with a code-generated base, a technique common since early Stable Diffusion and ControlNet days, making the “discovery” framing feel overstated.
  • Commenters debate whether text/number failures reflect fundamental LLM limitations or just unsolved engineering, citing rapid progress on tasks once deemed impossible (character counting, phonetics).
  • A mirrored idea surfaced: use image gen first to produce a photorealistic reference, then trace it into SVG, inverting the pipeline for tasks like complex SVG illustration.

Notable Comments

  • @dllu: Proposes the inverse pipeline: generate a photorealistic image first, then have an LLM trace it into SVG for tasks like “pelican riding a bike.”
  • @IdiotSavage: Notes that even with precise prompts, style details like “low-angle” and “artisanal” are quietly ignored by the model.

Original | Discuss on HN