US-based scientists at the Flatiron Institute’s Center for Computational Biology (CCB) and Princeton University have created a machine learning framework, called ExPecto, which can predict the effects of mutations in the ‘dark matter’ regions of the human genome and could one day aid in drug therapy selection.

ExPecto can pinpoint how specific mutations can disrupt the way genes turn on and off throughout a patient’s body and could help to avoid some of the fatal consequences of disrupted gene expression.

The scientists have published their study in Nature Genetics. They reported using the new method to compute the genetic consequences of more than 140 million mutations in different tissues and could pinpoint the mutations potentially responsible for increasing the risk of several immune-related diseases, including chronic hepatitis B, Crohn’s disease and Behçet’s disease.

CCB deputy director of genomics and Princeton professor Olga Troyanskaya is the principal investigator of the study. She said: “ExPecto can examine any genetic variant and predict its effect on gene expression. That’s incredibly exciting.”

Troyanskaya and her colleagues took a different approach to previous genome mutation studies and developed ExPecto as a program that can read a raw sequence of DNA and predict the corresponding effect on gene expression.

ExPecto uses deep learning methods. With a single reference genome, the researchers trained the program to understand how DNA controls gene expression across more than 200 different tissues and cell types. From this information, the program can predict the effect of any mutation, even mutations that scientists have never seen before.

The researchers hope that ExPecto will help medical experts identify the genetic contributors to a patient’s disease in the future and enable them to develop therapies customised to the patient’s genome.

CCB Flatiron research fellow and co-author of the study Jian Zhou said: “Once you know which protein is affected and what the protein does, then you can design drugs that can fix the problem. If you can’t produce a certain protein, then you could design a therapy that makes up for the missing protein.”

ExPecto’s predictions are available for anyone to access online as part of HumanBase, a data-driven prediction system about human biology and disease developed by the research team. By typing in the name of a gene, all the potential mutations that could affect that gene’s expression in any of 218 tissues and cell types will be displayed.