Researchers from Massachusetts General Hospital (MGH), Massachusetts Institute of Technology (MIT) and the University of Michigan (U-M) have developed machine learning models that can accurately predict a patient’s risk of developing gut-infesting bacteria Clostridium difficile (C. difficile).

The team published their findings in ‘A generalizable, data-driven approach to predict daily risk of Clostridium difficile infection at two large academic health centers’ in the journal Infection Control & Hospital Epidemiology. The authors noted that previous models of predicting infection only considered a small number of risk factors, and were not tailored to individual patients or hospitals, limiting their effectiveness.

“When data are simply pooled into a one-size-fits-all model, institutional differences in patient populations, hospital layouts, testing and treatment protocols, or even in the way staff interact with the EHR [electronic health record] can lead to differences in the underlying data distributions and ultimately to poor performance of such a model,” said assistant professor of computer science and engineering at U-M and co-senior author of the study Jenna Wiens, PhD.

“To mitigate these issues, we take a hospital-specific approach, training a model tailored to each institution.”

The researchers analysed data from individual patient demographics to their likelihood of exposure to C. difficile from the EHRs of almost 257,000 patients. Data was collected from MGH over a period of two years, and from Michigan Medicine, U-M’s academic medical centre, over six years.

The model generated daily risk scores for each patient that classify patients as at high-risk when a set threshold is exceeded.

The risk scores proved to be an accurate means of predicting which patients would be diagnosed with C. difficile; in half of those who were affected, accurate predictions could have been made at least five days before samples were collected for diagnosis, which would allow patients to receive antimicrobial interventions. The algorithm behind the models is also freely available for others to review and adapt for other institutions.

C. difficile affects almost half a million Americans, and kills nearly 30,000 per year. The infection targets individuals who are over 65 years old, have a weakened immune system, or have underlying conditions such as inflammatory bowel disease, cancer or kidney disease, and treatments cost $4.8 billion in the US in 2015 alone.

“This represents a potentially significant advance in our ability to identify and ultimately act to prevent infection with C. difficile,” said Vincent Young, MD, PhD, co-author of the study.

“The ability to identify patients at greatest risk could allow us to focus expensive and potentially limited prevention methods on those who would gain the greatest potential benefit. I think that this project is a great example of a ‘team science’ approach to addressing complex biomedical questions to improve healthcare, which I expect to see more of as we enter the era of precision health.”