Interpretable machine learning to decipher gene regulation in brain development and disruption in disease

Brain development is a complex process where cells must self-renew and differentiate at the right place and right time. Gene regulation during development involves sequences in the genome which affect the expression of genes locally, and transcription factors, proteins that bind these sequences and activate genes throughout the genome. At active regulatory sequences and genes, DNA is accessible to these proteins, while inactive DNA is tightly compacted. Understanding this regulation is critical for understanding the etiology of neurodevelopmental disease because genetic variants observed in disease often occur in, and alter, these regulatory DNA sequences, but it is challenging to link these variants to disease mechanism. Also, a critical challenge for studying the developing human brain is lack of access to tissue during human-specific events in late prenatal development. Here, we will use brain organoids, millimetre-sized 3D cell cultures which resemble specific parts of brain tissue during late fetal and early postnatal development, to address this gap. We will measure gene expression, DNA accessibility, and protein concentrations in individual cells in brain organoids at multiple timepoints during differentiation. Next, we will train machine learning models on these datasets to predict cell gene expression and DNA accessibility as a function of DNA sequence and transcription factor levels. We will use “glass box” models where we can interpret what they learned to extract specific DNA sequences and transcription factors that determine cell state. Finally, we will computationally predict the effects of genetic variants related to brain phenotypes and pathologies. Ultimately, these models will allow us to infer the mechanistic, downstream effect of any disease-associated or de novo variant, and systematically identify variants with likely causal impact during development.

Project Details

Funding Type:

Interdisciplinary Scholar Award

Award Year:

2024

Lead Researcher(s):

Team Members:

William J Greenleaf (Primary sponsor)
Anshul B Kundaje (Co-sponsor)