Advancement: Bayesian Joint Modeling of High-dimensional Dependent Count Data with Application to Microbiome Study

Shuangjie Zhang
Statistical Science PhD Student
Location
Virtual Event
Advisor
Juhee Lee

Join us on Zoom: https://ucsc.zoom.us/j/99362800152?pwd=a0cxdUhFVG1nc09vdGZ2UENyRmE4UT09  / Passcode: 875772

Description: High-dimensional dependent count data routinely arise in various application areas including genomics, and modeling dependent multivariate count data presents statistical challenges. Due to the discreteness, conventional multivariate data analysis methods are not directly applicable, and modeling dependence structure in counts of a vector is not straightforward. The analyses are often further complicated due to sparsity and large heterogeneity in samples. Motivated by microbiome studies, we propose to develop flexible Bayesian models for analyzing multivariate count tables. We first develop a Bayesian zero-inflated rounded log-normal kernel method to model interaction between microbial features in a community using multivariate count data in the presence of covariates and excess zeros. A zero inflation mixture component is used to account for excess zeros, and regression to infer covariate effects on abundances of microbial features and probabilities of them being absent in a sample. More importantly, it directly models counts and infers dependence structure between features through a covariance matrix. Joint sparsity is imposed to obtain a reliable estimate of the covariance matrix with a small sample size. We next propose a model that simultaneously analyzes multiple count tables from different feature groups. We utilize a group factor model, and the model provides inferences on interactions between microbial features within a group and across different groups. Double sparse structure is also considered, and group and feature-specific factors are used to achieve greater flexibility in modeling a high-dimensional covariance matrix. It also exploits a flexible mixture model for mean abundances to accommodate a great amount of variability in counts across samples. Lastly, we propose to build a tree-based model that incorporates phylogenetic information on microbial features through a phylogenetic tree. The phylogenetic information can enhance inferences on the abundances of the features and their interactions.  We explore different approaches of utilizing the information in modeling dependencies between microbial features.