Defense: High-Dimensional Inference and Uncertainty Quantification for Variable Selection, Clustering and Object-oriented Analysis with Bayesian and Approximate Bayesian Methods

Speaker Name
Rene Gutierrez
Speaker Title
Statistical Science Ph.D. Candidate
Speaker Organization
Statistical Science Ph.D.
Start Time
End Time
Virtual Event

Join us on Zoom / Passcode: 069952

Description: Bayesian computation of High-Dimensional problems using MCMC can be extremely slow since these methods perform costly computations at each iteration of the sampling chain. These problems are aggravated if the data size is large. The first part of the presentation proposes a novel dynamic feature partitioned regression (DFP) for efficient online inference for high dimensional linear regressions with large or streaming data. DFP constructs a pseudo posterior density of the parameters at every point to quickly and suitably update the pseudo posterior when a new data block arrives. Then it partitions the set of parameters to exploit parallelization for efficient posterior computation. The proposed approach is applied to high dimensional linear regression models with Gaussian scale mixture priors and spike and slab priors on large parameter spaces, along with large data, and yields state-of-the-art inferential performance. Over time, the algorithm enjoys theoretical support, as pseudo posterior densities get arbitrarily close to the full posterior as the data size grows.

The second part is motivated by a multi-modal imaging application where structural/anatomical information from grey matter (GM) and brain connectivity information in the form of a brain connectome network from functional magnetic resonance imaging (fMRI) are available for multiple subjects. We develop a model to predict a scalar response from multiple objects (fMRI and GM) and identify regions significantly related to the response. Our approach develops a flexible Bayesian regression framework exploiting network information of the brain connectome while leveraging linkages among connectome network and anatomical information from GM to draw inference on significant ROIs and offer predictive inference on the response. The principled Bayesian framework allows precise characterization of the uncertainty in ascertaining a region as influential for predicting the response and the quantification of predictive uncertainty for the response. We implement the framework using an efficient MCMC algorithm. Empirical results in simulation studies illustrate substantial inferential and predictive gains of the proposed framework over its competitors.

While the first two parts focus on high-dimensional and object-oriented regressions, the third part offers a novel clustering technique for high-dimensional tensors with limited sample size. Here we describe a clustering technique for high dimensional tensors with limited sample sizes when the clusters show differences in their covariances rather than their means. The proposed approach transforms a tensor into several matrices to adequately estimate its variability across different modes and implements a model-based approximate Bayesian clustering algorithm with the matrices constructed with the original tensor data. Although some information in the data is lost, we gain substantial computational efficiency and accuracy in clustering. The simulation study assesses the proposed approach and its competitors to estimate the number of clusters, identify the modal cluster membership, and the probability of misclassification in clustering (a measure of uncertainty in clustering). Clustering of tensors obtained from EEG data demonstrates an advantage of the proposed approach vis-a-vis its competitors.

Rajarshi Guhaniyogi