OpenAI upped the ante in the video generation space earlier this month, making Sora — its state-of-the-art text-to-video generator model — available to ChatGPT Plus users with Sora Turbo. Now, Google is gearing up to compete with the launch of its most advanced video generator.
On Monday, Google launched Veo 2, a text-to-video generator that boasts improvements from the company’s previous model, including a better understanding of real-world physics, which helps the AI produce better generations with more detail and realism, according to Google.
Also: This new Google AI tool lets you easily generate images from other photos – no prompt required
The videos generated can reach up to 4K resolution and, Google said, can tackle common video generator challenges — including hallucinations such as extra fingers. When evaluated by human raters against other leading video models, including Sora Turbo, Kiling v1.5, and Meta Movie Gen, Veo 2 was voted best on overall performance and prompt adherence.
Veo 2 also understands cinematography language, such as a specific genre, lens, or angle. For example, if a user says “shallow depth of field,” Veo 2 knows to blur out the subject’s background to produce the effect. The video below was created with a shot that specifically said, “Shot with a 35mm lens on Kodak Portra 400 film.”
The model is available to the public and can be accessed in VideoFX in Google Labs. The early access waitlist form asks for basic information such as age, name, place of residence, relevant work, and how you heard about it. Google said submissions are reviewed on a rolling basis.
Google also shared it improved its Imagen 3 image-generation model to generate “brighter and better composed” images. The improved model can generate more diverse styles and output images with higher prompt fidelity, richer details, and textures, according to the company.
This version of Imagen 3 is rolling out to the public via ImageFX in Google Labs starting today, and unlike VideoFX, it does not require a waitlist. The previous version of Imagen 3 was already very capable, ranking as the best AI image generator on ZDNET’s 2024 roundup.
Also: Google Labs just got a redesign. Here are 6 reasons to check it out
Lastly, Google unveiled Whisk, a new experiment that is also available in Labs. This tool allows users to create an image — or input their own — and transform it into a new image in the style of a plushie, pin, or sticker. It leverages Imagen 3 and Gemini, creating detailed captions for your image that are fed into Imagen 3 to create the final products.
Artificial Intelligence
<!–>
–>