Skip to content Skip to navigation

New algorithm could accelerate diagnosis of genetic diseases using clinical records

Wu Tsai Neurosciences Institute, Gill Gejerano
Image by Darryl Leja, NHGRI

By Helen Santoro

In a continued effort to speed up the diagnostic process of severe genetic diseases, Stanford's Gill Bejerano, PhD, and his colleagues have developed a new algorithm that can quickly locate important disease-related information within a patient's medical record.

In a paper recently published in Nature Genetics in Medicine, Bejerano and Cole Deisseroth, a Bio-X undergraduate fellow, along with researchers including Johannes Birgmeier, a graduate student in computer science, developed an algorithm that scans through records of patients and extracts the patient's key phenotypes, or observable traits.

The team focused particularly on patients with life-threatening genetic diseases such as sickle cell anemia, cystic fibrosis and Huntington's disease. Manually scanning through patient notes without a computer, a dedicated clinician can process around 200 patient records in a 40-hour work week, the researchers said. This algorithm can do the same job in 10 minutes, further saving busy doctors an additional three to five hours per every downstream disease diagnosis.

"A diagnosis is extremely valuable for the patient, for the family and for the attending clinician," said Bejerano. But finding the diagnosis within the patient's genome is very time consuming and, "for that we need computational tools."

The algorithm is called ClinPhen — a combination of "clinical" and "phenotypes." After examining a patient's medical records, the algorithm parses the sentences into short phrases. For example, if the clinical note reads, "The child has short stature and long eyelashes. She has a cleft palate and a small jaw," the phrases, "The child has short stature", "and long eyelashes", "She has a cleft palate", and "a small jaw" would be selected.

These phrases are converted into codes from an existing phenotype database called the Human Phenotype Ontology, or HPO. The codes are then sorted, with the most- and earliest-mentioned phenotypes at the top of the list. ClinPhen can also identify words, such as "father" and "does not have" and not associate any phenotypes mentioned in that sentence with the patient.

ClinPhen's accuracy, phenotype filter and speed was validated using six sets of real patient clinical notes from four different medical centers. When compared to other phenotype extraction algorithms, ClinPhen was more precise and 20 times faster, the research showed.

"ClinPhen actually guesses slightly better than a clinician," Bejerano said. "We can capture clinician intuition and pick the right set of phenotypes to best facilitate the diagnosis."

The HPO codes produced by ClinPhen will then be fed to Phrank, another algorithm designed by Bejerano and his colleagues that ranks patients' genes that have rare variants for their ability to explain the phenotypes, or traits, identified by ClinPhen.

Bejerano describes the efforts as a "computational ecosystem", and said he hopes to see this ecosystem implemented in clinical settings soon.

"The medical establishment is very conservative. And for good reason... You want to protect the patient," Bejerano said. "[But] I think that slowly, very slowly, there is a shift towards using more of these automated systems. With 60 million patients to be sequenced in the next several years, we simply have no choice."