AudioLM, Google's AI listens to a fragment of a song and completes it

October 8, 2022

Technology

A new AI system is able to create music (and words) from sounds after listening to a few seconds of audio or song

AudioLM, the system developed by the Google researchers, generates all sorts of sounds, including complex ones like piano music in a song or people talking, almost indistinguishable from the initial fragment that is submitted to it.

The technique is truly promising, and could be useful in many ways. For example, it will be able to speed up the artificial intelligence training process, or automatically generate music to accompany videos. But it's much more than that.

Play it again, Sam

We are already used to hearing audio generated by artificial intelligence. Those who argue every day with Alexa or Google Nest know it well: our voice assistants process natural language.

There are, to be sure, also systems trained in music: remember Jukebox by OpenAI? I told you about it here. All these systems, however, are based on long and complex "training", which involves the cataloging and administration of many "cues". Our artificial intelligences are greedy for data, and they always want more.

The next step is to make the AI "think" by enabling it to process the information it hears more quickly, without the need for long training. Something similar to what we try to do with self-driving systems.

How AudioLM works

To generate the audio, a few seconds of song or sound are fed into AudioLM, which literally predicts what comes next. It's not Shazam, it doesn't search for the entire song and replay it. He doesn't make collages of sounds that he has in memory. He builds them. The process is similar to the way i linguistic models like GPT-3 they predict phrases and words.

The audio clips released by the Google team sound very natural. In particular, the piano music generated by AudioLM seems more fluid than that generated with current artificial intelligences. In other words, he is better at capturing the way we produce a song, or a sound.

“It's really impressive, also because it indicates that these systems are learning some kind of multi-layered structure,” he says Roger Danenberg, a researcher in computer-generated music at Carnegie Mellon University.

Song or sound, the AI plays it all — AudioLM processes and predicts sounds in unprecedented ways

Not just a song

Imagine speaking to AudioLM, two words and that's it. The system will continue the speech by learning your cadence, your accent, your pauses, even your breathing. In summary, exactly your way of speaking. There is no need for specific training: he can do it almost by himself.

Like a parrot repeating the things you hear. Only this is a parrot capable of receiving and producing any sound, and autonomously completing those left in the middle.

In summary? We will have very soon (and in these cases it means very soon) systems that are able to speak much more naturally, and to compose a song or sound exactly like From E 2, MidjourneyAI and others create images, or Make A Video creates clips based on our input.

Who owns the rights to a song?

Even if these systems will be capable of creating content almost on their own, that “almost” still makes all the difference in the world, and makes it necessary to consider the ethical implications of this technology.

If I say “Thing, make me a different ending for Bohemian Rhapsody” and this thing makes a song along those lines, who will get the rights and collect the royalties for the song? Not to mention the fact that sounds and speeches that are now indistinguishable from human ones are much more convincing, and open up an unprecedented spread of misinformation.

In the document published to present this AI (I link it here), the researchers write that they are already considering how to mitigate these problems by inserting ways to distinguish natural sounds from those produced with AudioLM. I believe little. Many of the purposes for which this AI was created would be lost.

More generally, the risk is of producing a phenomenon that I would call "distrust of reality". If everything can be true, nothing can be. Nothing has value.

We try to train our natural intelligence for these changes, while thinking about how artificial intelligence can produce them. If we don't, we will have a song with a very bitter ending.

Gianluca Riccio, creative director of Melancia adv, copywriter and journalist. He is part of the Italian Institute for the Future, World Future Society and H+. Since 2006 he has directed Futuroprossimo.it, the Italian Futurology resource.

To report research, discoveries and inventions, contact the editorial team! Follow Futuro Prossimo on Whatsapp: exclusive news and updates (free).

FP on Fatto Quotidiano
Alberto Robiati and Gianluca Riccio guide readers through scenarios of the future: the opportunities, risks and possibilities we have to create a possible tomorrow.

On the same theme:

The last

AudioLM, Google's AI listens to a fragment of a song and completes it

Technology

Share

Play it again, Sam

How AudioLM works

Not just a song

Who owns the rights to a song?

We try to train our natural intelligence for these changes, while thinking about how artificial intelligence can produce them. If we don't, we will have a song with a very bitter ending.

Orwell Supermarket: shopping with facial recognition needs to be rethought

VASA-1, Microsoft's AI creates super realistic characters from just one photo

Amodei, Anthropic: 'AI will soon be able to replicate and survive autonomously'

Recycle plastic endlessly: new advanced recycling technologies

Tesla, the cracks in the myth: declining sales, declining confidence, Musk in the crosshairs

Cancer, frontier test detects it in a few minutes with micro drops of blood

Shrinkflation, France throws itself into combat against this practice

Ancient domestication: millennia ago the fox was a pet