LIBS, the AI that can read people's lips in videos

December 5 2019

Technology

LIBS is a new AI system that can read the lip better than anyone else, human and otherwise. It will help the deaf (and global snoops).

Artificial intelligence and machine learning algorithms that can read lips from videos are nothing extraordinary, actually.

In 2016, researchers at Google and the University of Oxford detailed a system that could lip-read and annotate footage with 46,8% accuracy. Does this seem like little to you? It already surpassed the 12,4% accuracy of a professional human lip reader. And there was no LIBS yet.

However, 46,8% are not up to par with the capabilities that artificial intelligence can show today. State-of-the-art systems struggle to overcome ambiguities in lip movements, which prevents their performance from surpassing that of audio-based speech recognition.

In search of a more performing system, researchers from Alibaba, Zhejiang University and Stevens Institute of Technology they devised a method dubbed Lip-by-Speech (LIBS), which uses features extracted from speech recognitions to serve as complementary cues. The system raises the bar by a further 8%, and can still improve.

LIBS and other similar solutions may help hearing impaired people to follow videos without subtitles. An estimated 466 million people worldwide suffer from hearing loss, equivalent to approximately 5% of the world's population. By 2050, the number could rise to more than 900 million, according to the World Health Organization.

LIBS, the AI that can read the lip better than anyone else

The AI method for reading the lip

LIBS derives useful audio information from several factors: Like a skilled cryptographer, the AI hunts for understandable words. At that point he compares them with the labial correspondence and searches for all the similar labiles. But it doesn't stop there: it also compares the video frequency of those frames, and other technical clues, refining the search to the point of reading the lips even in words incomprehensible to our ear.

If it seems complicated, try again, but I don't promise anything.

I quote from Technology presentation paper. “Both the speech recognition and lip reader components of LIBS are based on an attention-based sequence-to-sequence architecture, a machine translation method that maps an input to a sequence (audio or video)."

The researchers trained the AI on an initial database containing over 45.000 sentences spoken by the BBC, and on CMLR, the largest Chinese corpus available for lip reading in Mandarin Chinese, with over 100.000 natural sentences.

The fields of application are not limited only to aid for the deaf. The custom of attributing a "socially noble" use to every technology must never make us forget that the main use of these technologies is in the military or security sector.

Nobody has thought that this system can make the surveillance of security even more infallible and pervasive amazing new security cameras, or new satellite systems?

With AI now becoming a omniscient eye it will be a joke to listen (or rebuild) our whispers even from an orbiting satellite.

Shut up! (Until he reads his thoughts too) Big Brother listens to you!

Gianluca Riccio, creative director of Melancia adv, copywriter and journalist. He is part of the Italian Institute for the Future, World Future Society and H+. Since 2006 he has directed Futuroprossimo.it, the Italian Futurology resource.

To report research, discoveries and inventions, contact the editorial team! Follow Futuro Prossimo on Whatsapp: exclusive news and updates (free).

FP on Fatto Quotidiano
Alberto Robiati and Gianluca Riccio guide readers through scenarios of the future: the opportunities, risks and possibilities we have to create a possible tomorrow.

On the same theme:

The last

LIBS, the AI that can read people's lips in videos

Technology

Share

Artificial intelligence and machine learning algorithms that can read lips from videos are nothing extraordinary, actually.

The AI method for reading the lip

Shut up! (Until he reads his thoughts too) Big Brother listens to you!

Amodei, Anthropic: 'AI will soon be able to replicate and survive autonomously'

Interspecies contact: SETI Institute “converses” with a whale

LLM, the dead end: why ChatGPT and others will never get us to AGI

Amodei, Anthropic: 'AI will soon be able to replicate and survive autonomously'

When physics becomes heretical: will tachyons rewrite the laws of the cosmos?

Goldene: from a forgotten method gold one atom thick

The biotech revolution is knocking: who will open first?

Interspecies contact: SETI Institute “converses” with a whale

LIBS, the AI ​​that can read people's lips in videos

Share

Artificial intelligence and machine learning algorithms that can read lips from videos are nothing extraordinary, actually.

The AI ​​method for reading the lip

Shut up! (Until he reads his thoughts too) Big Brother listens to you!

LIBS, the AI that can read people's lips in videos

The AI method for reading the lip