Google DeepMind's new AI tech will generate soundtracks for videos

Google DeepMind develops AI tech to generate soundtracks and dialogue for videos, combining visual and text data for accurate audio effects.

: Google DeepMind introduces a technology to generate soundtracks and dialogue for videos using AI. The system understands visual content and can use text prompts to create accurate audio effects. It is being refined for audio quality and synchronization.

Google DeepMind has unveiled new artificial intelligence technology capable of generating soundtracks and dialogue to accompany videos. This video-to-audio (V2A) technology can interpret raw pixels from visual content and combine it with optional text prompts to produce corresponding sound effects.

The research team has trained this AI using videos, audios, and AI-generated annotations that include detailed descriptions of sounds and dialogue transcripts. This training enables the AI to associate specific sounds with visual scenes effectively, even for traditional footage such as silent films or videos without sound.

DeepMind acknowledges that there are still limitations to the V2A technology, such as potential audio quality drops due to video distortions and issues with lip synchronization. The team is committed to rigorous safety assessments and continuous improvements before the technology is made widely available.