Just a year and a half ago, the latest and greatest of Adobe’s Firefly generative AI offerings involved producing high-quality images from text with customization options, such as reference images. Since then, Adobe has pivoted into text-to-video generation and is now adding a slew of features to make it even more competitive.
Also: Forget Sora: Adobe launches ‘commercially safe’ AI video generator. How to try it
On Thursday, Adobe released a series of upgrades to its video capabilities that give users more control over the final generation, more options to create the video, and even more modalities to create. Even though creating realistic AI-generated videos is an impressive feat that shows how far AI generation has come, one crucial aspect of video generation has been missing: sound.
Adobe’s new release seeks to give creative professionals the ability to use AI to create audio, too.
Generate sound effects
The new Generate Sound Effects (beta) allows users to create custom sounds by inserting a text description of what they’d like generated. If users want even more control over what is generated, they can also use their voice to demonstrate the cadence or timing, and the intensity they’d like the generated sound to follow.
For example, if you want to generate the sound of a lion roar, but want it to match when the subject of your video is opening and closing its mouth, you can watch the video, record a clip of you making the noise to match the character’s movement, and then accompany it with a text prompt that describes the sound you’d like created. You’ll then be given multiple options to choose from and can pick the one that best matches the project’s vibe you were going for.
Also: Adobe Firefly now generates AI images with OpenAI, Google, and Flux models – how to access them
While other video-generating models like Veo 3 can generate video with audio from text, what really stood out about this feature is the amount of control users have when inputting their own audio.
Before launch, I had the opportunity to watch a live demo of the feature in action. It was truly impressive to see how well the generated audio matched the input audio’s flow, while also incorporating the text prompt to create a sound that actually sounded like the intended output — no shade to the lovely demoer who did his best to sound like a lion roaring into the mic.
Generate visual avatars
Another feature launching in beta is Text to Avatar, which, as the name implies, allows users to turn scripts into avatar-led videos, or videos that look like a live person reading the script. When picking an avatar, you can browse through the library of avatars, pick a custom background and accents, and then Firefly creates the final output.
<!–>
Adobe shares that some potential use cases for this feature include creating engaging video lessons with a virtual presenter, transforming text content into video articles for social media, or giving any materials a “human touch” – oh, the irony.
–>
Other video improvements
Adobe also unveiled some practical, simple features that will improve users’ video-generating experience. For example, users will now be able to use the Composition Reference for Video to upload a reference video and then add that composition to the new generation.
Also: Why Adobe Firefly might be the only AI image tool that actually matters
This is a huge win for creators who rely on generative video because no matter how good you get at writing prompts, the descriptions can often only describe a portion of the visual you are imagining. Now, you can spend less time explaining and still have the model understand your goal. When watching this live demo, the final output resembled the reference image well.
A new Style Presets option also allows users to customize their videos more easily by applying a visual style with a tap of a preset. These styles include claymation, anime, line art, vector art, black and white, and more.
–>