Tired of video editing? Google’s Gemini Omni changes scenes when you ask

What you need to know

Google unveiled Gemini Omni, a new multimodal AI model built to generate and edit videos using text, images, audio, and video inputs.
The model is designed to be context-aware and physics-aware, helping generated videos look more realistic and coherent over longer creative sessions.
Gemini Omni remembers previous instructions during multi-step edits, which could make iterative video creation much smoother.

Gemini is going to be much more than a chatbot. During its I/O event today, the company announced a new multimodal AI model called Gemini Omni, which is designed to help you create and edit videos from just about any kind of input you give it.

According to the company, Gemini Omni can combine text, image, audio, and video references into fully generated clips that are designed to stay coherent across scenes and edits. This means the AI no longer relies on traditional prompts alone.

Until now, AI video tools felt mostly fragmented. Some excel at visuals but suck at storytelling, while others cannot keep characters or environments consistent between edits. Google is pitching Gemini Omni as a solution to that disconnect. Omni is designed to be context-aware and physics-aware and to maintain continuity for longer creative sessions.

Latest Videos From

Since last year, Google has been steadily pushing Gemini deeper into creative workflows, with Nano Banana bringing the spotlight to Gemini-powered image generation and editing. Google’s blog post calls Omni the next big step in that strategy, with Google describing it as Gemini’s move from just reasoning about content to actually creating it.

One of the key features of Gemini Omni is its conversational editing capability. With Gemini Omni, users can simply tell the system what to change in natural language, rather than having to fire up a convoluted editing suite and tinker with clips frame-by-frame.

The company also says the model recalls previous commands during multi-step revisions, which could make iterative editing feel far less chaotic.

Google claims Omni has a better grasp of concepts like gravity, kinetic energy, and fluid dynamics than previous systems, meaning it generates more convincing scenes. The model combines Gemini’s world knowledge with visual generation, enabling it to generate explainers, educational visuals, and more narrative-driven scenes from brief prompts.

Mix any input together

Gemini Omni on stage at Google I/O 2026

(Image credit: Google)

Another major change is how flexible the inputs are. Gemini Omni can mix images, drawings, videos, text prompts, and audio references in one workflow. Google says creators can start with rough sketches or existing footage and then build those out into more polished cinematic clips. The system also supplies style and motion references, which afford users more control over the actual feel of the final video.

Google is also testing AI-generated digital avatars via its Omni project. These avatars look and sound like the users, so people can create custom video content without being on camera all the time.

That of course raises obvious concerns around misuse and deepfakes, and so Google is emphasizing its safety features up front. All Omni-generated videos will carry a watermark via the company’s SynthID technology, which invisibly tags AI-generated material for verification purposes.

The Gemini Omni Flash is the first version being made public for the time being. Google says it’s rolling out worldwide to Google AI Pro and Ultra subscribers through the Gemini app and Google Flow and users of YouTube Shorts and YouTube Create will also start to get access this week at no extra cost. API access for enterprise clients and developers will come in the next few weeks.

Android Central’s Take

Google can talk all day about physics realism, continuity, and SynthID safeguards, but none of that automatically solves the bigger issue: when everyone can generate endless polished videos in seconds, originality becomes harder to spot. Gemini Omni does look powerful, and it could be one of Google’s most important AI launches yet. But if all platforms suddenly become populated with perfectly generated talking avatars and synthetic storytelling, users may spend as much time figuring out what’s real as they do consuming the content.