Machine learning now generates realistic genomes of people who don't exist

February 5 2021

Technology

We have seen GAN generate faces, fake biographies, songs. Now they also generate artificial genomes, mostly indistinguishable from the real one.

Between new algorithms and IT advances, machines can now learn increasingly complex models. They come to generate high quality synthetic data such as photorealistic images, and even resumes of imaginary humans.

now a study published in the international journal PLoS Genetics shows the advanced use of machine learning on biometric data. From existing biobanks, the system generates entire blocks of human genome that do not belong to real humans but have the characteristics of a real genome.

Bypassing the privacy issue

“Existing genomic databases are an invaluable resource for biomedical research," He says Burak Yelmen, first author of the study and Junior Research Fellow of Modern Population Genetics at the University of Tartu. “The problem is that they are not publicly accessible or protected by long and drawn out enforcement procedures due to valid ethical concerns. This creates a major scientific barrier for researchers. A machine-generated Genome, an “artificial genome”, can help us overcome the problem within a safe ethical framework.”

A chromosome emerges from random digital noise. Credit: Burak Yelmen

The multidisciplinary team performed multiple analyzes to evaluate the quality of the genome generated by machine learning compared to the real one. “Remarkably, this genome mimics the complexities we can observe within real human populations and, for most properties, they are indistinguishable from the other genomes of the biobank used to train our algorithm. Except for one detail: they do not belong to any gene donor,” said the dr. Luca Pagani, one of the study's senior authors and fellow Mobilitas Pluss.

A machine-generated Genome, an “artificial genome”, can help us overcome the problem within a safe ethical framework
Burak Yelmen

Is the genome truly original or a "spitted" copy?

The study also involves evaluating the proximity of the artificial genome to the real genome to verify whether the privacy of the original samples is preserved. “While detecting privacy leaks across thousands of genomes might seem like looking for a needle in a haystack, combining multiple statistical measures allows us to carefully check all patterns. Interestingly, detailed exploration of complex dispersion patterns in turn leads to other improvements in the assessment of GAN and will fuel the field of machine learning.” The doctor says it Flora Jay, study coordinator and researcher of the CNRS, French National Center for Scientific Research).

All in all, the machine learning approaches already provided turned, biographies and many other features to a handful of imaginary human beings. We now know more about their biology as well. These fictional humans with realistic genomes could serve as an experimental bench in place of real genomes that are not publicly available.

The research could remove a major accessibility barrier in genomics research, particularly for underrepresented populations.

Gianluca Riccio, creative director of Melancia adv, copywriter and journalist. He is part of the Italian Institute for the Future, World Future Society and H+. Since 2006 he has directed Futuroprossimo.it, the Italian Futurology resource.

To report research, discoveries and inventions, contact the editorial team! Follow Futuro Prossimo on Whatsapp: exclusive news and updates (free).

FP on Fatto Quotidiano
Alberto Robiati and Gianluca Riccio guide readers through scenarios of the future: the opportunities, risks and possibilities we have to create a possible tomorrow.

On the same theme:

The last

Machine learning now generates realistic genomes of people who don't exist

Technology

Share

Between new algorithms and IT advances, machines can now learn increasingly complex models. They come to generate high quality synthetic data such as photorealistic images, and even resumes of imaginary humans.

Bypassing the privacy issue

Is the genome truly original or a "spitted" copy?

The research could remove a major accessibility barrier in genomics research, particularly for underrepresented populations.

VASA-1, Microsoft's AI creates super realistic characters from just one photo

Amodei, Anthropic: 'AI will soon be able to replicate and survive autonomously'

Interspecies contact: SETI Institute “converses” with a whale

BlackHawk, 45 knots with only 50 kW: all the secrets of the flying dinghy

A referee drone? All football is missing is flying VAR

TikTok transforms (and divides) the Western world: will it be banned?

Goodbye lithium, welcome sodium: the breakthrough of new generation batteries

Tell me what you look like and I'll tell you who you vote for: an AI predicts political orientation