VALL-E, Microsoft's AI that "steals" your voice in 3 seconds

January 10 2023

Technology

The new artificial intelligence system reproduces a human voice starting from a few seconds of audio. Great potential (and great risks).

You know, artificial intelligence is the theme of these months: it has just begun an explosion that will only show us all its effects in the next few years.

Microsoft is also behind the sails of this technology: it recently used AI to improve the functionality of its apps, and now it could invest as much as 10 billion dollars in OpenAI, the company that created ChatGPT. Today, however, I hear of another Microsoft project, VALLEY, which is incredible.

This cutting-edge tool has been trained on a vast amount of voice data, over 60.000 hours of English speaking. A data set that makes it, according to the Redmond company, "hundreds of times larger than existing systems". Included the more advanced ones.

And what did VALL-E learn to do? Nothing, a trifle. She reproduces and imitates anyone's voice perfectly, after listening to it for just three seconds.

AI voice — VALL-E, that is: 3 seconds and they clone your voice.

A voice replicator?

It's not just this. VALL-E is a real revolution in the field of vocal artificial intelligence. Because it reproduces with extraordinary precision the emotions, vocal tones and acoustic environment present in a given sample, and is a giant step forward compared to existing text-to-speech (TTS) systems. In other words, VALL-E's voice sounds much more like that of a human being than that of an artificial intelligence.

On his Linkedin profile (visit it), the digital strategist Alberto Giacobone links to a small library of vocal samples created by VALL-E e put online on the GitHub platform. The results are surprising: in many clips the intonation and accent of the speakers' voices are perfectly reproduced.

Some examples are less convincing, and this shows that VALL-E is not yet a finished product. However, the overall output is so convincing that it leaves us speechless.

An example of the first results obtained by VALL-E. Above, the original audio sample. Below, the “cloned” voice.

Big risks, big potential

It is clear that this technology raises concerns about potential risks of misuse, such as identity theft. VALL-E will be able to create voice deepfakes indistinguishable from real people, which could be used to deceive people in many cases and ways.

To counter this threat, in the VALL-E presentation document (I link it here) Microsoft says it is working on developing a detection model that can distinguish a real voice from a synthetic voice.

Despite the (big) risks, however, tools like VALL-E could be particularly useful to help people find their voice after an accident, to effortlessly create more natural podcasts and audiobooks and… as always, the limit is your imagination.

Gianluca Riccio, creative director of Melancia adv, copywriter and journalist. He is part of the Italian Institute for the Future, World Future Society and H+. Since 2006 he has directed Futuroprossimo.it, the Italian Futurology resource.

To report research, discoveries and inventions, contact the editorial team! Follow Futuro Prossimo on Whatsapp: exclusive news and updates (free).

FP on Fatto Quotidiano
Alberto Robiati and Gianluca Riccio guide readers through scenarios of the future: the opportunities, risks and possibilities we have to create a possible tomorrow.

On the same theme:

The last

VALL-E, Microsoft's AI that "steals" your voice in 3 seconds

Technology

Share

You know, artificial intelligence is the theme of these months: it has just begun an explosion that will only show us all its effects in the next few years.

A voice replicator?

Big risks, big potential

Autonomous weapons, the Vienna conference: "act now to protect humanity"

Osteoarthritis, AI blood test beats X-rays and predicts it 10 years earlier

I'll take you into the future of “automated” and AI-generated entertainment

Ayrton Senna, 30 years later: his tragic end changed the future of F1

Autonomous weapons, the Vienna conference: "act now to protect humanity"

Vertical farming: how much do you know? 4 myths to dispel

From body scan to relaxation: RO, the robotic vest that calms your nerves

Cattle manure turns medical waste into compost: Indian study