Google's Gemini Omni turns photos, audio, and text into video — and that's just the beginning

[ad_1]

When Google was founded Gemini three years agohis goal was to create a large multimodal language – a single neural network that was trained on text, image, audio, and video and can generate the content of those images.

Today, at his Google I/O conferencethe company has taken steps to achieve that goal with the Gemini Omni, a new family of hybrids that Google CEO Sundar Photosi says will be able to “create anything from any accessory.”

Omni starts with a video. Users can now combine images, audio, video, and text, and instead of simply connecting the inputs, Omni takes all of that into account to produce a consistent output. The results are high-quality videos that demonstrate an understanding of physics, culture, history, and science.

Omni also allows users to edit images and text in plain text instead of complex editing programs, similar to Google’s Nano Banana.

Google already has a dedicated video, What?which allows users to turn text and images into videos, too manage and edit avatars. But Google DeepMind’s chief marketing officer Nicole Brichtova says today’s release isn’t just about changing Veo: “It’s the next step in the evolution of integrating Gemini’s intelligence with our video delivery capabilities.”

One example that Koray Kavukcuoglu, DeepMind’s chief technology officer, gave to reporters at a press conference on Monday: When Omni was given a simple information like “a clay description of protein folding,” it quickly produced a video of the explainer stopping with the words, “Proteins start as chains of amino acids into amino acids, the alpha part, called alphas. to make a nice three-dimensional shape.”

Omni’s long-term vision is much broader, involving analogies being used to create things like creating images from audio, or audio from video.

“When we first announced Gemini, it was our first AI model to be multi-dimensional,” Photosi said during the briefing. “We knew that teaching by combining text, code, audio, images, and video could help us understand the world more deeply. With global models, AI is moving from predictions to reality. Gemini Omni is the next step.”

As part of the release, users will also be able to create videos with their own digital avatars – something OpenAI is known for in its recently released Sora app and Cameos. In order to avoid the deep, users have to go through a dedicated group, which includes recording themselves and speaking several numbers, according to Brichtova. The avatar is then saved for future use.

In addition, all videos created by Omni will include Google’s SynthID digital watermark, which allows users to verify whether videos were created through Gemini products.

The first model in the family is Gemini Omni Flash, which will be released today for the Gemini app, YouTube Shorts, and AI creative studio Flow. Flash will be able to produce 10 seconds of video, which Brichtova says is not the limit of the model, but a decision based on the desire to get into more hands and the expectation that many users will not want to make longer videos. Full-length movies are on the way soon, though.

Google seems to be positioning Omni Flash as a consumer tool. Examples Brichtova and Gabe Barth-Maron, a researcher at DeepMind, called TechCrunch using digital avatars were all personal: Creating an award-winning video or going to the moon, or removing a passerby from the background of a video you took on vacation.

Barth-Maron explains clearly: “It’s like self-selecting memes.”

“We focused on making this easy to use for consumers,” Brichtova said. “There aren’t a lot of movie franchises that have broken the mold with consumers, so this is our play to do that.”

The ease of use comes with a caveat: Brichtova and Barth-Maron pointed out that changes must be specific, otherwise Omni risks changing or inadvertently changing things the user wants to keep – a problem that users of the Nano Banana would have faced.

Even a closer look at consumers, Omni’s business is the result of creation it’s obvious, and Google will make Omni available via an API in the coming weeks. The avatar creation tool — a capability that’s currently available on Shorts — is something Google hopes developers will adopt. But more broadly, an end-to-end multimodal workflow could be game-changing for advertisers and filmmakers.

Startup Luma AI is doing something similar, a helpful tool which can create an entire marketing campaign based on a brief summary and product image, with the help of its “integrated” model.

“We are very proud of the way the brand delivers sound, which is very useful for things like advertising,” said Brichtova. “If you want something somewhere, or even a description, it has to be right…

Technical usage issues can be better served by the Omni Pro version, which should perform well in all Omni functions. Google hasn’t said when it will release Pro, but Brichtova said it will happen “when we feel like we’re about to change beyond Flash.”

Find out more about Google IO 2026 keynotes

Google search as you know it is over

Google has updated the Gemini app to include ChatGPT and Claude

Google is launching Gemini Spark, a 24/7 assistant and Gmail integration

How to use Google’s new tools

When you purchase through links in our articles, we can get a little work. This does not affect our authorship.

[ad_2]

Source link

Google’s Gemini Omni turns photos, audio, and text into video — and that’s just the beginning

Find out more about Google IO 2026 keynotes

Leave a ReplyCancel Reply