Seeing cancer through and through with data analysis has been made possible by artificial intelligence thanks to researchers’ work at the Max Planck Society. Nature Machine Intelligence publishes the results.

Previously unknown genes

The new algorithm can predict which genes will cause cancer, even if the DNA sequence has not changed. A team of researchers in Berlin combined various data and analyzed it using “artificial intelligence” to identify many oncogenes. This opens up new perspectives in the development of targeted cancer treatments and biomarkers in personalized medicine.

In cancer, cells multiply and invade tissues, destroying organs and thereby disrupting their vital functions. Unlimited growth is usually caused by the accumulation of DNA changes in oncogenes, mutations in these genes that control cell development. However, some cancers have very few mutated genes. This means that other causes lead to a dangerous disease.

A team of researchers from the Max Planck Institute for Molecular Genetics (MPIMG) in Berlin and the Institute for Computational Biology in Helmholtz uses machine learning techniques to identify 165 previously unknown cancer genes. Researchers use a special algorithm to analyze the data.

The sequences of these genes do not necessarily change. It is obvious that a violation of their regulation can already lead to cancer. All recently identified genes interact closely with well-known oncogenes. They are necessary for the survival of tumor cells, have shown experiments in cell cultures have shown.

Additional goals for personalized medicine

An algorithm called “EMOGI” in Explainable Multi-Omics Graph Integration can also explain the relationship between cellular mechanisms that turn a gene into an oncogene. As a group of researchers led by Annalisa Marsico explains in the journal Nature Machine Intelligence, the software brings together tens of thousands of datasets created from patient samples. This includes information on DNA methylation, the activity of individual genes and interactions of proteins within the cellular pathway, and data on sequences with mutations. In this data, deep learning algorithms discover the patterns and molecular principles that lead to cancer development.

Unlike traditional cancer treatments such as chemotherapy, customized treatments are tailored to the specific type of tumor. “Our goal is to choose the best treatment for each patient, the most effective treatment with the least side effects. Also, molecular properties can be used to detect cancers that are already in their early stages, ”explains Marsico, head of the MPIMG research group.

“Only by knowing the cause of the disease can we effectively counteract or correct it,” the researchers write. “This is why it is so important to identify as many of the mechanisms that cause cancer as possible.”

Better results with a combination

“To date, most research has focused on pathogenic changes in sequence or cellular circuitry,” said Roman Schulte-Sasse, a doctoral student on Marsico’s team and first author of the publication. “At the same time, it has recently become clear that epigenetic disorders or dysregulation of gene activity can also lead to cancer. ”

This is why the researchers combined sequence data representing circuit failures with information that represents events in cells. Scientists initially confirmed that mutations or proliferation of genomic segments are actually the main cause of cancer. Then, in a second step, we identified candidate genes that are not very directly related to genes that actually cause cancer.

“For example, we found a gene in cancer that has little sequence change, but it regulates energy supply and is necessary for tumors,” says Schulte-Zass. These genes cannot be controlled in any other way. For example, it is caused by chemical changes in DNA, such as methylation. These changes do not affect sequence information but dominate gene activity. Such genes are promising targets for drug discovery, but because they work in the background, they can only be found through sophisticated algorithms.”

Further research

A new program of researchers adds many new entries to the list of suspicious oncogenes. In recent years alone, it has grown from 700 to 1000. Researchers have only tracked hidden genes using a combination of bioinformatics analysis and modern artificial intelligence (AI) techniques.

There are many more interesting details hidden in the data. “We see a lot of cancer patterns,” says Marsico. “I think this is evidence that different molecular mechanisms in different organs cause tumors.”

The researchers emphasize that the EMOGI program is not limited to cancer. In theory, it can be used to integrate different sets of biological data and find patterns. The algorithms apply to similar complex diseases.