Generative Principal Component Regression via Variational Inference
IEEE Trans Signal Process. 2026;74:1656-1670. doi: 10.1109/tsp.2026.3682931. Epub 2026 Apr 10.
ABSTRACT
Interest in detecting networks responsible for phenotypes spans multiple scientific disciplines, including proteomics and neuroscience. Latent variable models, such as Principal Component Analysis (PCA), are a natural choice for such networks, as they treat the observed measurements as outputs of a small number of unobserved processes. Unfortunately, they often fail to incorporate information relevant to the phenotypes of interest, especially when those phenotypes are noisy or have low variance. As a result, when the latent variables are used for prediction, as in principal component regression (PCR), they often perform substantially worse than traditional regression methods. Supervised variational autoencoders (SVAEs) attempt to remedy this by adding a predictive loss to the latent space. However, this supervision introduces a systematic discrepancy between the encoder distribution and the posterior distribution implied by the generative model. Left unaddressed, this mismatch can lead to misleading scientific conclusions and suboptimal intervention strategies when the model is used to guide experimental manipulations. To resolve these issues, we introduce generative principal component regression (gPCR), a novel objective for linear latent variable models that enforces consistency between encoder and posterior while preserving predictive accuracy. gPCR matches the performance of standard regression approaches while retaining the scientifically desirable network interpretation. Using synthetic data, we demonstrate that gPCR learns more realistic loadings and dramatically improves target selection compared to both PCR and SVAEs. We further validate gPCR's utility on two electrophysiology datasets, showing enhanced predictive power and better integration of phenotype-relevant signals into the learned loadings. Finally, in a proteomics study of an Alzheimer's disease cohort, we show that gPCR recovers biologically coherent networks that align more closely with prior experimental findings than traditional regression methods.
PMID:42222061 | PMC:PMC13220988 | DOI:10.1109/tsp.2026.3682931