Are you ready for the latest and greatest in AI-generated tunes? Well, hold onto your headphones, because Google has just unveiled more information about the revolutionary Music LM model that can create high-quality music from text descriptions. That’s right, you can now turn your wildest musical fantasies into reality with just a few keystrokes.
So, what exactly is Music LM?
Simply put, it’s a machine learning model that can synthesize High-Fidelity music from text descriptions. This cutting-edge technology is based on Audio LM, which focuses on creating top-notch audio. And the best part? Music LM can generate music consistently at 24 kilohertz over several minutes, all while sticking to the brief. It was released on January 26th, when Google casually released a state-of-the-art (SOTA) technology in the field of text-to-music. This AI model generates music from text captions while not using any diffusion, which is surprising.
But how does Music LM actually work?
The model approaches music generation as a hierarchical sequence-to-sequence modeling task using three pre-trained models: Mulan, Word2vec, and Soundstream. Mulan generates similar tokens for an audio clip and its associated text description during training. Audio passes through three pre-trained models, and the output tokens are then passed to two separate models: a semantic model that prevents memorization, and an acoustic model that generates acoustic tokens. After training, text can be passed to Mulan instead of audio, and the resulting tokens are decoded through Soundstream’s decoder to synthesize music.
If you’re interested in studying or developing music-generating machine learning models, Google has released a public dataset called Music Caps, which consists of 5,500 music text pairs with rich text descriptions provided by human experts. The website offers examples of different genres, including reggaeton, electronic dance music, meditative music, jazz, pop, rock, death metal, and more. The AI model can generate music for specific situations, such as escaping prison or a futuristic club. The model can also generate music for paintings, such as Napoleon crossing the Alps or The Scream.
What is the MusicLM story mode?
Music LM also has a unique feature called the story mode, where it can continuously play a piece of music and change it depending on the sequence of text. It’s like having your own personal DJ who can create a custom playlist. The story mode includes several text conditions such as jazz, pop, rock, death metal, rap, string quartet with violins, epic movie soundtrack with drums, and Scottish folk song with traditional instruments. Plus, Music LM’s long-generation capabilities allow users to create music that can last several minutes while still sounding natural and enjoyable.
Music LM can even use a story-like description to generate music. For example, taking Wikipedia descriptions from paintings, Music LM can generate soundtracks that fit perfectly.
Generation Diversity
Music LM is more powerful than any of the previous texture music AI because of the flexibility it offers and how it can understand a very long string of text. The technology has a wider generation diversity, meaning the same text prompt can generate a wide range of different music compositions.
Examples of AI-generated Music
The Music LM demo page offers several sample music tracks to listen to, each with its text condition listed alongside it. The site includes arcade game soundtracks, reggaeton and electronic dance music fusion, melodic techno swing, relaxing jazz, and many more.
Why Google doesn’t release MusicLM to the public?
The main reason why Google has no immediate plans to release MusicLM for public use is due to copyright concerns. During an internal experiment, they found that approximately 1% of the generated music was an exact replica of a piece of music that the model was trained on. The co-authors of the research paper acknowledged the potential risk of misappropriating creative content and stressed the need for further work to address these risks associated with music generation.
The fact that even a 1% replication rate exists is sufficient for Google to be hesitant about releasing MusicLM. Furthermore, copyright issues surrounding the use of content for training AI models are still being considered. Creating music, or any other content for that matter, requires extensive time and effort. It raises ethical concerns when an AI model utilizes someone else’s work to create its own content. This presents an intriguing new challenge for researchers to tackle.
Conclusion
Thanks to Google’s Music LM, we will be able to bring your musical dreams to life with ease. Whether you’re a musician, a music lover, or just someone who enjoys experimenting with new technology, Music LM will be a new important step in AI-Audio-World. It challenges the traditional way of creating music and opens doors for new forms of music that were not possible before. So, Google, when we can start playing?