Google DeepMind's new AI tech will generate soundtracks for videos
Google DeepMind develops AI tech to generate soundtracks and dialogue for videos, combining visual and text data for accurate audio effects.
Google DeepMind has unveiled new artificial intelligence technology capable of generating soundtracks and dialogue to accompany videos. This video-to-audio (V2A) technology can interpret raw pixels from visual content and combine it with optional text prompts to produce corresponding sound effects.
The research team has trained this AI using videos, audios, and AI-generated annotations that include detailed descriptions of sounds and dialogue transcripts. This training enables the AI to associate specific sounds with visual scenes effectively, even for traditional footage such as silent films or videos without sound.
DeepMind acknowledges that there are still limitations to the V2A technology, such as potential audio quality drops due to video distortions and issues with lip synchronization. The team is committed to rigorous safety assessments and continuous improvements before the technology is made widely available.