5/12/23

"Soundtracks" for Generative AI Video

Inspired by Runway's Gen-1 'Video to Video' feature, I envisioned an exciting enhancement: creative sound controls that generate unique soundtracks to match the award-winning visuals we see.

In the linked video, I demonstrate this concept by using Gen-1 in the Runway iPhone app as inspiration. The video showcases a person who created a video choosing from four distinct sonic presets to generate a customized soundtrack, ready to share as a complete movie file. Two examples feature AI voices simulating new character voice replacement, while another applies a voice effect to the original audio. Each example represents a different sonic take on the same visual, adding a thrilling audiovisual dimension to the final works of art.

As a sound and product designer, this concept prompted me to dive deeper into generative AI content and imagine the evolution of soundtracks. Some standout features include:

πŸŽ§πŸ‘‰πŸ½ - Presets: Easily select recognizable sonic treatments for quick and easy sharing.

πŸ—£οΈπŸŒ - AI Voice Replacement & Effects: Enable expressive AI characters and voice effects, providing localized versions that cater to a wider global audience. I utilized expressive AI characters from https://play.ht/ with added voice processing in this concept.

πŸŽšοΈπŸŽ› - Advanced Controls: Fine-tune the mix elements, including music, ambiences, sound effects, and voice, to achieve a customized result.

Generative AI products prioritizing great soundtracks, such as branded sound, music videos, and cinematic consumables, present exciting challenges and opportunities. Considerations include:

🏞️🧠 - Object Recognition: Can sounds be associated with people, places, or things in a scene, allowing users to adjust and customize them?

πŸŽ₯πŸšΆβ€β™€οΈ- Position and Motion: Can the system generate sounds for movement and create natural audio mixes for objects that move within and beyond the perspective?

πŸŽΆπŸŽ™ - Content Generation: Should sound content be generated, pulled from existing libraries, include user generated recordings, or a hybrid system? Exploring this with content creators could yield valuable insights.

Previous

Immersive XR Audio

Next

Meta Presence Platform Hackathon Recap