Hands-on review

I tested Google Veo 4: longer scenes, better physics

What changed since Veo 3, what holds up under client prompts, and where the rollout still lags.

By the Vuela.ai content team ·

Cover from deepmind.google/models/veo.

What it nails

  • Longer single-shot scenes (up to 12 seconds)
  • Identity persistence across multi-shot prompts
  • Sharper texture and skin detail than Veo 3
  • Native audio matching Veo 3 fidelity

Where it struggles

  • Staged rollout, regional gating still common
  • API around $0.60 per second with audio
  • Locked into the Google ecosystem
  • No built-in cloner or lip-sync translator

Veo 4 is the next-step release in Google DeepMind’s text-to-video family. The bet is incremental rather than revolutionary: take the parts of Veo 3 that worked (native audio, prompt fidelity) and push the parts that did not (length, identity persistence, sharpness). After two weeks of testing inside the Gemini app, Flow, and the API, my answer is that Veo 4 is the right call for short-form ads and product video, and still not the right answer for cloning, translation, or anything that needs a pipeline.

What changed in Veo 4

Three concrete deltas vs Veo 3 in the prompts I ran. Single-shot length moves from 8 seconds to 12 seconds before the model defaults to a chained extension. Identity persistence across two follow-up prompts is meaningfully better: faces, wardrobe, and props survive a cut without the “fraternal twin” drift that haunted Veo 3.1. Texture and skin detail step up a tier; close-ups no longer get the slightly waxy look Veo 3 produced under harsh light.

Audio is essentially the same engine as Veo 3, but dialogue prosody is cleaner on emotional lines. The “sad smile / British accent” test that gave Veo 3.1 four-of-five takes hits five-of-five on Veo 4. Music generation is unchanged, which is fine for ambient beds and not enough for a finished track.

How I got access

Google AI Pro ($19.99/mo) unlocked Veo 4 in Gemini and Flow the day I requested it. AI Ultra ($249.99/mo) unlocks longer daily quotas and the highest-tier render queue. For API tests I provisioned a managed endpoint with metered billing at around $0.60 per second of generated video with audio (a Fast tier sits around $0.30 per second without audio).

The three tests I ran

  1. 12-second single shot. A woman walking from a sunny courtyard into a shadowed hallway, camera dollying behind. Lighting transition was the unforgiving part.
  2. Two-shot identity. A man in a navy blazer in shot one, same man entering a cafe in shot two. Face and wardrobe locked, or drift?
  3. Dialogue + motion. A character running while shouting a line over the shoulder. Combined motion and lip sync at the same time.
Long-form scene composition in the Veo family. Official sample from Google DeepMind.

Test 1. 12-second single shot

The lighting transition was the test, and Veo 4 nailed it. Sunlight on the courtyard cobbles fell off into shadow with correct contact shadow on the woman’s heels, no popping or rebanding at the doorway. Of five takes, four were postable; the fifth had a frame jitter halfway through that I assume was a render anomaly. Twelve seconds in a single shot is a real workflow change: stitching used to be where Veo 3 lost cinematic feel, and Veo 4 collapses that into one render.

Test 2. Two-shot identity

This is where Veo 4 most clearly beats its predecessor. The same man appeared in both shots with the same face, the same blazer, and the same hair, across five attempts in a row. Veo 3.1 used to drop identity at the second prompt about half the time. Kling 3 still has the slight edge on extreme close-ups, but for medium and wide shots Veo 4 is comparable. For ad campaigns that need a recurring character, this is the unlock.

Test 3. Dialogue + motion

A character running and shouting is the unforgiving native-audio test. Lip sync stayed coherent through the head turn and the over-the-shoulder pose. The voice quality is still thinner than dedicated ElevenLabs, but the timing is right and the prosody on emotional lines is the biggest jump.

Veo character + audio test from the official showcase. Official sample from Google DeepMind.

The annoying parts

Staged rollout. My EU teammate still gets Veo 3.1 by default. Veo 4 is being released by region and account tier. Plan around inconsistency for the next quarter.

API price. $0.60 per second with audio is steeper than Veo 3 was at launch. A 12-second clip is $7.20. Five attempts is $36. Budget accordingly.

No pipeline. Veo 4 generates clips. It does not clone viral formats, translate finished video, or repurpose into vertical/square formats. For a production pipeline you still need other tools on top.

Is it worth the price?

For creators doing short-form ads and product video, Veo 4 inside AI Pro ($19.99/mo) is an obvious upgrade. For developers integrating into a product, the per-second API math gets steep fast — flat-rate aggregators end up cheaper at any meaningful volume.

How Vuela.ai fits into a Veo 4 workflow

Vuela.ai bundles Veo-class generation with the things Veo cannot do on its own: a viral video cloner, a lip-sync translator into 30+ languages, and product-to-video for ecommerce. New Veo versions (Veo 3.1, Veo 4, Veo 4 Fast) roll into your plan as Google releases them, without you having to provision API access or budget per-second billing.

Veo 4 quality without the rollout wait

Vuela.ai exposes the latest Veo models on a flat plan alongside cloner, translator, and 70+ tools.

The verdict

Veo 4 is the right Veo for 2026. Longer scenes and reliable identity solve the two biggest reasons teams were stitching Veo 3 with another model. For ads, product, and brand video, it is the safer bet over Sora 2 on quality and over Kling 3 on prompt fidelity. Sora 2 still wins on physics-heavy scenes and consumer features; Kling 3 still wins on stylised image-to-video.

Build your pipeline around it, with a cloner and a translator on top. Veo 4 is a magnificent specialist that needs a workspace, not a replacement for one.

Veo 4 review FAQ

Is Veo 4 publicly available? +

Veo 4 is rolling out progressively. Access is staged through Google AI Pro, Google AI Ultra, the Flow app, and the Vertex AI / Gemini API. Most accounts can request the model today; some regions still see Veo 3.1 by default.

How is Veo 4 different from Veo 3? +

Three concrete deltas: longer single-shot duration (up to 12 seconds), sharper detail in textures and skin, and better identity persistence across cuts. Audio quality matches Veo 3, with refined dialogue prosody.

How much does Veo 4 cost? +

Consumer access is bundled into Google AI Pro ($19.99/mo) and AI Ultra ($249.99/mo). On the API, Veo 4 with audio is around $0.60 per second of video, with a Fast tier around $0.30 per second.

Veo 4 vs Sora 2? +

Veo 4 leads on prompt fidelity and audio integration. Sora 2 still has the edge on physics, length, and consumer features like Cameos. For ads and product video, Veo 4 is the safer pick.

Can I use Veo 4 inside Vuela.ai? +

Vuela.ai exposes Veo-class generation alongside the cloner, lip-sync translator, and 70+ tools under a flat-rate subscription. New model versions roll into your plan as Google releases them.

Veo 4 plus the rest of the pipeline

Vuela.ai gives you the latest Veo on a flat plan with cloner, lip-sync translator, and 70+ tools.