The first draft of the human genome was published exactly 20 years ago. It took nearly three years to complete, at a cost of nearly a billion dollars. The Project Human Genome it has allowed scientists to read, almost from one end to the other, the 3 billion base pairs of DNA (or "letters") that biologically define a human being.
It was an epochal undertaking. That project allowed a new generation of researchers to identify new targets for cancer treatments, engineered mice with human immune systems and even build one web page where you can navigate the entire human genome as if it were Google Maps.
The first complete human genome was generated from a handful of anonymous donors. The aim was to produce a reference genome that represented more than a single individual. As expected, it was not enough to understand the wide diversity of human populations around the world. No two people are alike and no two genomes are alike. If researchers wanted to understand humanity in all its diversity with greater precision, a single human genome would not be enough.
Thousands or millions of them would have to be sequenced: and this is precisely the purpose of a project currently underway.
Understanding genetic diversity
The richness of genetic diversity among people is what makes each person unique. But genetic changes also cause many ailments and make some groups of people more susceptible to certain diseases than others.
At the time of the human genome project, researchers were also sequencing the complete genomes of simpler organisms such as mice , fruit flies , yeast e some plants . The enormous effort made to generate these first genomes has led to a revolution in the technology required to read genomes. A technology that has advanced to the point that today it does not take years and millions of euros to sequence an entire human genome. Now it takes a few days and costs less than a thousand euros.
Thousands of genomes
Advances in technology have allowed scientists to sequence the complete genomes of thousands of individuals from around the world. Initiatives such as the Genome Aggregation Consortia they are making great efforts to collect and organize this scattered data. So far, that group has been able to collect nearly 150.000 genomes. Within this dataset, the researchers found more than 241 million differences in people's genomes, with an average of one variant every eight base pairs .
Most of these variations are very rare and will have no effect on a person. However, hidden among them are variants with important physiological and medical consequences. For example, some variants of the BRCA1 gene predispose certain groups of women, such as Ashkenazi Jews, to cancer ovaries and breasts. Other variants in that gene carry some Nigerian women to experience higher than normal mortality for breast cancer.
How to identify these variants of the human genome?
The best way researchers can identify these types of variants at the population level is through studies comparing the genomes of large groups of people with a control group. But the diseases are complicated. An individual's lifestyle, symptoms, and time to onset can vary greatly, and the effect of genetics on many diseases is difficult to distinguish. The predictive power of current genomics research is too low to rule out many of these effects because there are not enough genomic data .
Understanding the genetics of complex diseases, particularly those related to genetic differences between ethnic groups, is essentially a big data problem. And researchers need more data. Much more data.
To address the need for more data, the National Institutes of Health initiated a program called All of Us . The project aims to collect genetic information, medical records and health habits from surveys and wearable devices of over one million people in the United States over the course of 10 years. It opened to the public in 2018, and more than 270.000 people have contributed samples since then.
The great potential of this project lies in the possibility of doing research by crossing the most disparate data. A neuroscientist could look for genetic variations associated with depression by considering, for example, exercise levels. An oncologist might look for variants related to skin cancer risk based on ethnic differences.
With one million human genomes we will have an extraordinary wealth of data to discover the effects of genetic variation on disease, not only for individuals, but also within different groups of people.
The dark forest of the human genome
Another advantage of this project is that it will allow scientists to learn about parts of the human genome that are currently very difficult to study. Most of the genetic research has been on the parts of the genome that code for proteins. However, these only represent the1,5% of the human genome.
One promising piece of research focuses on RNA, a molecule that transforms the messages encoded in a person's DNA into proteins. However, RNAs that come from 98,5% of the non-protein producing human genome have a host of other functions. Some of these ANNs are involved in processes such as the way in which cancer spreads , embryonic development or control of the X chromosome in females. Because the All of Us project includes all coding and non-coding parts of the genome, it will be by far the largest dataset available to shed light on these mysterious RNAs.