Is it time for end-to-end AI movies?

Date:


We have all seen loads of AI-generated movies floating throughout social media. Most are useless web concept at worst, cute at greatest. After all, there are some which are scary good, however till not too long ago, they have been resource-intensive to develop, by way of time, tokens, and even the {hardware} required.

Google has determined to alter that. It has been launching AI choices for each main class (assume Firebase Studio for vibe coding), and Veo is its reply to AI-generated movies. And it is spectacular.

Curiously—in a side-eye sort of approach—a few of its coaching information is from YouTube. Google hasn’t essentially elaborated on what which means exactly, however do with that what you’ll.

In any case, Veo 3 appears to be a serious breakthrough: now you can embody audio technology alongside video technology, in a approach that does not appear like your movies are defying the legal guidelines of physics. 

After spending time testing it myself, I can say it is a important leap ahead—although it nonetheless has loads of quirks as this know-how finds its footing.

Desk of contents:

What’s Google Veo?

Google Veo is a household of AI video technology fashions that may create movies from textual content prompts or from static photographs. The newest mannequin, Veo 3, consists of native audio technology alongside video (the earlier mannequin, Veo 2, produced silent clips). 

That native audio, real-world physics simulation, and superior immediate understanding are what make Veo 3 stand out from different AI video turbines. As compared, different AI video turbines, like Sora and Runway, do not have native audio performance (but). 

Google Veo 3 at a look

Veo 3 is de facto spectacular—of us are already utilizing it to overtake their advertising methods. Earlier than we dive into the way it works and what it will probably do, this is a fast look at what it does properly and the place it nonetheless wants some love.

Google Veo professionals:

  • Native audio and video technology with natural-sounding speech and background noise or music

  • Practical physics simulation for parts like water, material, and light-weight

  • Glorious cinematic digicam controls and scene composition

  • Superior immediate understanding, particularly for interplay cues

  • A number of enter choices (textual content, picture, frames)

  • Built-in in Move and Gemini, with an intuitive interface (particularly in Move)

  • Continually enhancing and already forward of opponents like Runway or Sora

Google Veo cons:

  • Restricted in size to eight seconds

  • Inconsistent character continuity throughout scenes, even with detailed prompts

  • Immediate interpretation varies, making repeatable outputs exhausting

  • Restricted textual content accuracy in visible parts (e.g., miswritten phrases)

  • Some bugs and crashes when combining photographs or switching between modes

  • Seen watermarks until you pay for Extremely ($249.99/month)

Veo 3 has native audio technology alongside video technology

Native audio technology is Veo 3’s headline characteristic, and it is spectacular…when it really works. In a single experiment, for instance, I used to be attempting to create an obnoxious film trailer. The voice was okay, however it did not produce the loud, energetic, and thunderous high quality I needed. 

However general, I discovered that the speech patterns really feel pure, not robotic, and environmental sounds mix properly with the visuals. This is an instance of one thing I made.

The background music can also be good, however this is the place I hit a limitation: it is tough to suit each significant dialogue and cinematic music throughout the 8-second clip restrict (which is at present a limitation throughout all obtainable plans). 

In movies with out voice, the music fills the area superbly, however while you want each dialogue and music, one thing has to provide. For now, you would possibly wish to iron out the voice first and add the music you need after. 

The physics in Veo 3 movies truly make sense

From all of the movies I’ve seen made with it, Veo 3 excels at water physics, material motion, and lighting reflections. It handles advanced situations like “rain on glass” or “smoke dispersal” extra convincingly than opponents.

And through my testing, the physics felt fairly plausible. It is not good, as you’ll be able to see within the video above, however individuals moved naturally, clothes behaved appropriately, and lighting appeared reasonable. 

And Google is aware of it—you may see this plastered throughout its advertising.

Sustaining character consistency throughout scenes is difficult with Veo 3

Google markets character continuity as a key differentiator. There are two foremost options designed to assist keep character look throughout a number of photographs:

  • Bounce to helps create consistency round bringing a particular element, like a personality, into new movies.

  • Prolong expands upon what’s occurring within the present scene.

After all, neither characteristic is ideal.

In my first check, I began with Veo 3 Quick (which permits text-to-video) for the preliminary shot. Once I tried utilizing Bounce to to generate the second shot, Veo 3 Quick wasn’t suitable—it routinely switched to Veo 2 Quick, which misplaced the audio completely.

Creating a video with Google Veo 3 on Flow

So I switched to Veo 3 High quality (which permits frame-to-video) for the second shot and added a 3rd shot with one other immediate utilizing Veo 3 High quality. Neither of the Veo 3 High quality outputs included audio. And once I tried so as to add the third shot to my scene, the video received corrupted and I obtained the error: “One thing Went Unsuitable.”

General, all through my assessments, separate prompts for various photographs made it practically inconceivable to attain character consistency. Even with hyper-detailed descriptions, Veo 3 interpreted them as artistic ideas quite than strict necessities, giving me utterly totally different actors and settings. 

We’re nonetheless early days right here, and what I used to be attempting to do was advanced, so I am not stunned it did not nail it. However Veo getting this proper feels prefer it’s most likely only a few small tweaks away.

Veo 3 provides skilled controls and high quality, however it requires loads of hand-holding

The digicam work was excellent in my assessments. Veo 3’s cinematic language understanding delivers wonderful composition and motion high quality. 

I received that very same high quality for single-person photographs, however at any time when I used to be coping with a number of individuals in a video, I needed to be much more prescriptive. For instance, in my preliminary assessments, individuals in conferences have been trying towards the digicam as an alternative of on the particular person speaking.

A screengrab from a Google Veo video showing people looking at the camera in a meeting

Slightly awkward.

I attempted to repair this by including specific interplay cues to Veo prompts like “trying instantly at him with involved expressions” and “keep eye contact with the presenter all through.” The outcomes have been…medium.

Veo 3 is nuanced in its immediate adherence, however it would not interpret the identical immediate persistently

Veo 3 actually appeared to know nuanced prompts and delivered outcomes that matched my imaginative and prescient. However there is a limitation (the identical you may discover with most massive language fashions, too): Veo 3 would not interpret the identical immediate persistently. 

Working similar prompts a number of occasions can yield surprisingly totally different outcomes, making it tough to get constant outputs for skilled workflows that require precise replication.

You’ll be able to see the characters altering all through, and once I capitalized the phrase “basis,” issues received even wonkier.

Veo 3 works with a number of enter strategies

Veo 3 provides a number of methods to create movies:

  • Textual content-to-video: That is essentially the most extensively examined and praised enter technique, and it is what I used for many of my video technology assessments.

  • Picture-to-video: My testing with this gave blended outcomes, however when it really works, it is fairly neat. Be aware that this at present makes use of Veo 2 Quick, so there isn’t any audio.

    An animated version of Maddy's headshot created with Veo 3
    Animating my headshot with Google Veo
  • Frames-to-video: This one has restricted Veo 3 compatibility, as I skilled firsthand. It’s a must to use Veo 2 or Veo 3 High quality, which might create workflow problems. Nevertheless it consists of entry to digicam controls, which permits exact shot path (e.g., huge, close-up, monitoring photographs).

    Using frames-to-video in Google Veo with Flow

The place are you able to entry Veo 3?

Veo 3 is accessible in each the Gemini chatbot and Move, Google’s AI filmmaking app. I discovered it was simpler to make use of it with Move, personally.

It is also extra accessible in Move as a result of Gemini provides Google AI Professional subscribers solely 10 trial Veo 3 movies, whereas Move provides 100 generations monthly for a similar plan. (Extra on pricing in a bit.)

Move can also be purpose-built for video creation with skilled instruments like:

  • Digicam controls for exact shot path

  • Scene-building capabilities

  • Venture administration and group

This is a video clip instance with digicam controls.

There are additionally some geographic limitations to concentrate on, particularly that Move is not obtainable all through the EU but. You should use Veo 2 within the EU through Gemini solely. In any other case, it is obtainable in 70+ international locations, together with the US, Canada, Australia, the UK, and India. 

Google Veo pricing

You will must entry Google Veo by way of a broader Google subscription plan, particularly:

  • Google AI Professional ($19.99 monthly) provides you 1,000 month-to-month AI credit. For Move, that is 100 credit for Veo 3 High quality, 20 credit for Veo 3 Quick, and 10 credit for Veo 2 Quick.

  • Google AI Extremely ($249.99 monthly) provides you 12,500 month-to-month AI credit, early entry to new options, and no seen watermarks. It is notably the one plan that does not embody watermarks (paying Professional prospects aren’t tremendous pleased about it).

Extremely subscribers additionally get entry to Substances to Video. This characteristic enables you to add particular person parts (characters, objects, backgrounds) individually and mix them into scenes for higher consistency throughout photographs.

create your first video with Veo 3

When you’re subscribed to the Google AI Professional or Extremely plan, head over to Move (what I like to recommend) or use the Gemini app (although I had a tough time getting it to work there).

This is tips on how to get began with Google Veo in Move:

  1. Click on within the immediate discipline, and describe your scene intimately. Embody specifics: setting, characters, actions, and digicam angles.

  2. For dialogue, use quotes, e.g., “Character says ‘particular dialogue right here.'”

  3. Add interplay cues, e.g., “trying instantly at one another” or “nodding in settlement.”

  4. Specify audio, like background music sort, environmental sounds, and so forth.

After you set all the things up in keeping with your preferences, generate the video, and look ahead to the top consequence, which is usually obtainable inside a couple of minutes. 

Listed below are a couple of ideas: 

  • Keep away from utilizing all caps for emphasis. It confuses audio technology.

  • Be extraordinarily particular about character interactions to keep away from the camera-staring I discussed earlier.

  • For multi-shot sequences, embrace scene modifications quite than preventing for consistency.

  • Take a look at totally different variations of the identical immediate to seek out what works.

And reminders on a couple of limitations:

  • Each video is capped at 8 seconds, which severely limits storytelling potentialities. You’ll be able to’t develop advanced narratives or showcase detailed processes.

  • Google AI Professional customers get seen watermarks on all generated content material. Solely Extremely subscribers ($250 monthly) keep away from this branding.

  • Once I supplied a body containing a revenue and loss assertion, Veo 3 added textual content errors like “Expensestes” as an alternative of “Bills.” The AI struggles with correct textual content technology inside scenes. That mentioned, so did OpenAI’s early picture fashions—I am certain this may not be a difficulty for lengthy.

Give Google Veo a spin for AI video technology

This is my very flawed—however nonetheless spectacular—ultimate lower utilizing Google Veo with Move.

Veo is not prepared for prime time start-to-finish initiatives, however the premise and early assessments are actually promising. And taking the quirks for a spin now will provide you with an concept of tips on how to refine and velocity up the video creation course of down the road because the know-how catches up.

Associated studying:

spacefor placeholders for affiliate links

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Share post:

Subscribe

spacefor placeholders for affiliate links

Popular

More like this
Related