Between new algorithms and IT advances, machines can now learn increasingly complex models. They come to generate high quality synthetic data such as photorealistic images, and even resumes of imaginary humans.
now a study published in the international journal PLoS Genetics shows the advanced use of machine learning on biometric data. From existing biobanks, the system generates entire blocks of human genome that do not belong to real humans but have the characteristics of a real genome.
Bypassing the privacy issue
“Existing genomic databases are an invaluable resource for biomedical research," He says Burak Yelmen, first author of the study and Junior Research Fellow of Modern Population Genetics at the University of Tartu. “The problem is that they are not publicly accessible or protected by long and drawn out enforcement procedures due to valid ethical concerns. This creates a major scientific barrier for researchers. A machine-generated Genome, an “artificial genome”, can help us overcome the problem within a safe ethical framework.”
The multidisciplinary team performed multiple analyzes to evaluate the quality of the genome generated by machine learning compared to the real one. “Remarkably, this genome mimics the complexities we can observe within real human populations and, for most properties, they are indistinguishable from the other genomes of the biobank used to train our algorithm. Except for one detail: they do not belong to any gene donor,” said the dr. Luca Pagani, one of the study's senior authors and fellow Mobilitas Pluss.
A machine-generated Genome, an “artificial genome”, can help us overcome the problem within a safe ethical framework
Burak Yelmen
Is the genome truly original or a "spitted" copy?
The study also involves evaluating the proximity of the artificial genome to the real genome to verify whether the privacy of the original samples is preserved. “While detecting privacy leaks across thousands of genomes might seem like looking for a needle in a haystack, combining multiple statistical measures allows us to carefully check all patterns. Interestingly, detailed exploration of complex dispersion patterns in turn leads to other improvements in the assessment of GAN and will fuel the field of machine learning.” The doctor says it Flora Jay, study coordinator and researcher of the CNRS, French National Center for Scientific Research).
All in all, the machine learning approaches already provided turned, biographies and many other features to a handful of imaginary human beings. We now know more about their biology as well. These fictional humans with realistic genomes could serve as an experimental bench in place of real genomes that are not publicly available.