The first cross-cutting and absolutely gapless DNA sequence of the human chromosome is an important milestone in genomics research. Although the current human reference genome is the most accurate and complete vertebrate genome, gaps still remain in the DNA sequence. And this is after two decades of improvement. Scientists first determined the complete sequence of the human chromosome from one end to the other (telomere-in-telomere) without omissions and an unprecedented level of accuracy. The results of the study are published by the journal Nature.

The complete assembly of the human X chromosome is a landmark achievement for genomics researchers. Lead author of the study, Karen Miga of the California Institute for Genomics in Santa Cruz, said the project was made possible by new sequencing technologies. The nanopore technology, which allows sequencing of long segments of DNA, leaves the molecules intact.

Duplicate DNA sequences are spread throughout the genome and have always been a problem for sequencing. Most technologies produce relatively short “readings” of the sequence, which then need to be put together, like a mosaic to assemble the genome. Repeating sequences lead to many short reads and they look almost the same. It remains a mystery how different parts of the genome fit together or how many of their repetitions.

These repeating sequences were once considered unsolvable, but now we have made a breakthrough in sequencing technology. Thanks to nanoporous sequencing, we get extra-long readings of hundreds of thousands of base pairs, which can cover the entire repeating area, which allows you to bypass some problems.

Karen Miga, researcher at the California Institute of Genomics in Santa Cruz

Filling the remaining gaps in the sequence of the human genome opens up new areas for it. Now scientists have the opportunity to look for connections between sequence variations and a specific disease, as well as other keys to important issues of biology and human evolution.

Despite the fact that the reference sequence of the human genome was created about twenty years ago, it still has hundreds of gaps. Yes, most DNA sections are repeated, but missing segments may include functional elements related to certain diseases.

Scientists emphasize that many of the previously inaccessible parts of the genome are among the richest for variations in human populations. This information may be important for understanding general human biology.

Karen Miga and Adam Phillippy of the National Institute for Human Genome Research (NHGRI) co-founded a telomere-telomere consortium (T2T) to complete the genome assembly. Previously, they worked together on a 2018 document that demonstrated the potential of nanopore technology to obtain the complete sequence of the human genome. For their purposes, scientists used the MinION Oxford Nanopore Technologies sequencer. It captures current changes that occur when individual DNA molecules pass through nanopores in the membrane. After that, the researchers once again analyzed these large DNA molecules on two different instruments, each of which generates very long sequences.

To combine the data, scientists used their own computer program to assemble many segments of the generated sequences. Using technology, the team created an assembly with a whole human genome that surpasses all previous assemblies in terms of continuity, completeness, and accuracy. Continuous assembly even surpasses the current reference human genome in some respects.

However, according to scientists, there were still several interruptions in this sequence. To complete the X chromosome, the team had to manually eliminate several gaps in the sequence.

A new human genome sequence derived from a human cell line called CHM13 closes many gaps in the current reference genome known as the Genome Reference Consortium build 38 (GRCh38).