Gemini Omni

· ai · Source ↗

TLDR

  • Google’s Gemini Omni combines multimodal reasoning with video generation and editing, accepting any input type to create or transform video output.

Key Takeaways

  • Supports iterative prompt-based video editing: change camera angles, swap objects, add synced audio, and adjust lighting frame-accurately.
  • Demonstrated use cases include claymation explainers, stop-motion educational content, chain-reaction marble shots, and alphabet sizzle reels.
  • All content created via Gemini app, Google Flow, or YouTube Shorts includes SynthID watermarks and C2PA Content Credentials for provenance verification.
  • Verification tooling is expanding to Chrome and Search; C2PA metadata lets viewers confirm AI origin across the web.
  • Safety pipeline includes continuous automated evals, external human red teaming, and pre-release ethics reviews.

Hacker News Comment Review

  • Commenters who tested it against Seedance 2.0 found Gemini Omni Flash behind on quality, with Seedance 2.1 already closing any remaining gap.
  • Rigid-body physics remains a concrete weak point: one commenter’s standard Jenga-tower test produced discontinuous brick behavior, consistent with known solver discontinuities AI models struggle to learn.
  • Broader unease surfaced around deepfake potential and the cultural cost of AI video flattening visual credibility entirely.

Notable Comments

  • @manas96: Uses Jenga tower collapse as a physics benchmark; Gemini Omni Flash failed realistic rigid-body contact, producing sudden “explosion” artifacts.
  • @kenjackson: Notes AI video has degraded his ability to find any video impressive; authenticity is now the only axis that matters to him.

Original | Discuss on HN