Google Veo 3: AI video with picture and sound

At Google I/O 2025, Google DeepMind unveiled the next generation of its video model: Google Veo 3. Where earlier AI video models were limited to silent or still-frame clips, Veo 3 takes a clear step forward. From a single text prompt, the model generates video and adds matching audio: simulated speech, ambient sound, and music.

The technology relies on multimodal AI that combines language, image, and sound. With it, Google is setting a new bar for synthetic content production.

What sets Veo 3 apart?

Veo 3 produces high-quality video clips (720p, 24 fps, up to 8 seconds) with:

Realistic facial animation and lip-sync that match the generated speech.
Dynamic sound effects that fit the scene, such as rain, traffic, or footsteps.
Underlying music that supports the tone of the moment.

The generated output is surprisingly convincing and, in places, approaches professionally produced content.

Alongside Veo, Google also introduced Flow: an accessible tool that lets users build scenes, refine prompts, and combine video components in one place. Flow brings models like Veo, Imagen (AI imagery), and Gemini (language) together in a single environment, aimed at content creation for professionals and hobbyists alike.

Where earlier AI models struggled with unnatural hand movement or warped perspective, Veo 3 stands out at realistically simulating physical interaction, depth, and camera motion. The result is a cinematic style that suits a wide range of use cases.

A few examples of footage generated with Gemini Veo 3 are shown below:

Access and pricing

For now, Veo 3 is only available to selected users through the Google VideoFX program. A broader commercial release is in the works, most likely on a subscription model. The barrier to access remains high in the short term.

Disinformation and misuse

The ability to generate convincing video with sound opens the door to new forms of disinformation, deepfakes, and synthetic propaganda. Google says the model ships with watermarking and moderation, but that will not stop bad actors from finding workarounds.

Conclusion

Google Veo 3 is a meaningful step in the evolution of AI video. For organisations that want to lead in content creation, it offers powerful capabilities, provided it is used with an eye on ethics, authenticity, and transparency. The next few years will determine how we anchor this technology in responsible use and regulation.

Veo 3 is not just another tool. It is a catalyst for change. The teams that dig in now will be the ones leading the next wave of digital transformation.

How are you handling the rapid development of AI video? Will you watch from the sidelines, or actively put the technology to work for your goals? At Echovise we are happy to think it through with you and figure out where your organisation can make a difference.

Your Team

Your Tools

Google Veo 3: AI video with picture and sound, the new standard for content creation?

What sets Veo 3 apart?

Access and pricing

Disinformation and misuse

Conclusion

Want to put AI to work in your organisation?