Defense: Distributed Inference and Data Sketching for High Dimensional Spatial Regression Models

Laura Baracaldo
Statistical Science PhD Candidate
Location
Virtual Event
Advisor
Rajarshi Guhaniyogi

Join us on Zoom: https://ucsc.zoom.us/j/98922028398?pwd=RmFFeU12MEI0Y0ZRS215c2lHSnMrQT09 / Passcode: 500116

Description: This work focuses on scaling Monte Carlo (MC) computations for large-scale Bayesian inference in complex spatial models with adequate point estimation and uncertainty in inference and prediction.

We first derive a three-step distributed Bayesian inferential framework for multivariate spatial generalized linear mixed effect models (MVspGLMMs) for big data. Then, we introduce Bayesian data sketching for spatially varying coefficient regression models (SVCM) to obviate computational challenges presented by large numbers of spatial locations. To address the challenges of analyzing very large spatial data, we compress spatially oriented data by a random linear transformation to achieve dimension reduction and conduct inference on the compressed data. Our approach distinguishes itself from several existing methods for analyzing large spatial data in that it requires neither the development of new models or algorithms nor any specialized computational hardware while delivering fully model-based Bayesian inference. Well-established methods and algorithms spatial regression models can be applied to the compressed data. We establish posterior contraction rates for estimating the spatially varying coefficients and predicting the outcome at new locations under the randomly compressed data
model. We use simulation experiments and conduct a spatial analysis of remote sensed vegetation data to empirically illustrate the inferential and computational efficiency of our approach. Importantly, the entire analysis does not require revealing the original data to the analysts, which leads to preservation of data privacy.

Finally, we propose a novel idea that employs data sketching for distributed Bayesian inference. There is a recent literature in Bayesian SVCMs focusing on spatial variable selection, which faces severe computational and inferential challenges in presence of data with large number of spatial locations. To address these challenges, we introduce a three-stage strategy built on the idea of Bayesian data sketching approach. The proposed approach addresses spatial variable selection in SVCMs with big data without developing fundamentally new models or algorithms or making use of any specialized computational hardware while delivering fully model-based Bayesian inference. One important byproduct of our approach is that it solves the sensitivity in inference due to the selection of data subsets in the distributed Bayesian framework discussed in the second chapter. Simulation study and real data analysis establish efficiency of the proposed approach in the task of high dimensional spatial variable selection and regression surface estimation.