Home » Departments » Statistics » Statistics Research

Statistics Research

The Statistics Department at UC Santa Cruz has a strong reputation in research and graduate education. Faculty expertise include Bayesian methods, dependent data modeling, machine learning and optimization, data science, and applied statistics. Statistics faculty include four fellows of the American Statistical Association (ASA) and two fellows of the International Society for Bayesian Analysis (ISBA)—several have held leadership roles in both organizations. Graduates of the Statistics Ph.D. program have earned awards for their research and gone on to careers in industry, government, and academia. 

Graph depicting bayesian statistics

Bayesian Methods

The Statistics Department at UC Santa Cruz has well-established expertise in Bayesian methodology and computation, with strengths in hierarchical and nonparametric modeling. Faculty contribute actively to both theoretical developments and interdisciplinary applications of Bayesian methods.

Bayesian nonparametric methods build flexible prior probability models for distributions and functions. These priors support general distributional shapes, non-linear relationships, and complex dependence structures, enabling broader inferences and predictions than customary parametric models. Faculty have contributed to the development and study of nonparametric priors and their applications. Methodological work includes prior models for categorical distributions (such as ordinal responses and sparse multivariate count data), clustering, nonparametric regression (using density regression, Gaussian processes, and neural networks), point process intensities, spectral densities, survival analysis, and temporally or spatially dependent distributions. Applications include microbiome studies, earthquake data analysis, proteomics, risk assessment in toxicity studies, spectral analysis of multichannel EEG recordings, and the study of hurricane landfalls.

Another major Bayesian research area involves hierarchical modeling for complex dependent data structures. Faculty develop cutting-edge hierarchical models to address challenges in  noisy high-dimensional data, survey and census data, multi-dimensional temporal data, and spatial or spatio-temporal data. Some methods combine stochastic modeling with information about underlying physical processes and often integrate data from multiple sources. Areas of methodological study and application include biomedical signal processing, climatology, computer simulation experiments, epidemiology, extremes, global health, multivariate time series analysis, official statistics, quantile regression, small area estimation, and spatio-temporal modeling.

The Bayesian modeling methods developed at UC Santa Cruz require advanced computational techniques for inference, including Markov chain Monte Carlo methods and variational algorithms. Prediction is often a key inferential goal. Faculty also explore utility function-based methods for decision-making, with research that includes clinical trial design and utility-based personalized treatment selection.

Faculty: Athanasios Kottas, Herbie Lee, Ju Hee Lee, Richard Li, Paul Parker, Raquel Prado, Bruno Sansó

Biostatistics digital graph

Modeling Dependent Data

Many data sets involve some form of correlation or dependence. Ignoring this dependence can lead to spurious conclusions. Correlated data fall into three main types: time series, spatial processes, and spatio-temporal processes.

Time Series

Time series are sequential measurements collected in a time-ordered fashion. Examples include daily temperatures and stock prices, yearly storm counts and agricultural yields, weekly hospital visits and sports team goals, biomedical or geological signals such as electroencephalograms and earthquake recordings. Research often focuses on understanding and modeling these phenomena, forecasting future values, quantifying trends, and identifying factors that influence the series, along with assessing uncertainty in the conclusions.

Faculty: Athanasios Kottas, Richard Li, Robert Lund, Paul Parker, Raquel Prado, Bruno Sansó

Spatial Processes

Spatial data consist of observations collected across a spatial domain. Examples include the land elevation by longitude and latitude, the number of Covid-19 deaths by U.S. county, earthquake locations, and census population counts by city. Research often focuses on modeling these  phenomena, forecasting values at unobserved locations, quantifying trends or outliers, and identifying factors that influence the spatial process, while also assessing uncertainty in the conclusions.

Faculty: Sangwon Hyun, Athanasios Kottas, Herbie Lee, Richard Li, Robert Lund, Paul Parker, Raquel Prado, Bruno Sansó

Spatio-temporal Processes

Spatio-temporal data include both spatial (location) and temporal (time) components, allowing researchers to analyze phenomena that change across space and time. Examples include climate measurements across geographical regions, tracking disease outbreaks in a country, and monitoring traffic accident patterns highways over time. Research often focuses on understanding the phenomena (modeling), forecasting values at unobserved locations, quantifying trends, and identifying factors that influence spatial and temporal patterns, while also assessing uncertainty in the conclusions.

Faculty: Athanasios Kottas, Richard Li, Robert Lund, Paul Parker, Raquel Prado, Bruno Sansó

Digitized graph of correlated data

Machine Learning and Optimization

Machine learning and optimization are central to modern data analysis, allowing researchers to  extract meaningful patterns from complex, high-dimensional datasets. Cutting-edge research in our department combines modern machine learning algorithms with rigorous statistical inference and scalable optimization. These innovative approaches offer flexibility to model complex processes, but also bring challenges, such as adapting them for uncertainty quantification and interpretability. Our faculty develop statistical models using a variety of machine learning techniques.

Statistical Neural Networks

Neural networks are powerful tools for modeling complex nonlinear relationships and are often viewed algorithmically rather than as statistical models. Recent work in the department embeds different types of neural networks within a complete statistical model, enabling uncertainty quantification—an important goal for many scientific applications. These new “statistical neural networks” improve predictions in areas such as spatial prediction, small area estimation, and time series forecasting.

Faculty: Sangwon Hyun, Herbie Lee, Robert Lund, Paul Parker

Regularization, Variable Selection, and High-Dimensional Optimization

In high-dimensional datasets, where the number of variables can exceed the number of observations, it can be challenging to identify the most relevant features. Regularization techniques help by adding constraints to the model or by placing appropriate prior distributions on model parameters. This work is grounded in the challenges of high-dimensional optimization, which is important for inference. Faculty conduct research on advanced regularization methods and optimization algorithms tailored for high-dimensional settings, including Bayesian approaches for variable selection and penalized likelihood approaches. This research is especially important in  fields like healthcare, where models must be both accurate and interpretable.

Faculty: Sangwon Hyun, Ju Hee Lee, Richard Li, Raquel Prado, Bruno Sansó

Tree-Based and Ensemble Methods

Decision trees are intuitive models that split data based on features to make predictions. While simple, these models are prone to overfitting. Ensemble methods such as random forests or Bayesian additive regression trees combine many trees to create more robust models. This ensemble approach can also be applied beyond trees. For example, combining multiple neural networks can improve predictive performance. Faculty develop new statistical methods involving ensembles of both trees and neural networks.

Faculty: Marcela Alfaro Córdoba, Herbie Lee, Richard Li, Paul Parker

Scalable Statistical Computation

Computing is essential to modern statistics, especially with widespread access to computing power. UC Santa Cruz offers high-performance computing facilities that support parallel computer experiments on servers and GPUs. Faculty regularly develop and adapt computational algorithms for scalability, precision, and speed. Topics include expectation-maximization algorithms, convex optimization, Markov chain Monte Carlo techniques, genetic algorithms and stochastic optimization, dynamic programming, path algorithms for sparse regression, and high-performance computing.

Faculty: Sangwon Hyun, Richard Li, Robert Lund, Paul Parker, Raquel Prado, Bruno Sansó

Digitized design and sampling image

Data Science and Applications

Data-driven analysis plays a pivotal role across many scientific fields and societal challenges. Our faculty are actively involved in interdisciplinary data science research, collaborating with domain researchers to develop tools and datasets for open science, generating new insights, improving decision making, and addressing complex real-world problems. The department is also dedicated to data science education, training the next generation of data scientists and quantitative researchers.

Health Data Science

Many areas of health sciences involve data-intensive research, from the molecular and cellular level to individual patient care and public health monitoring and surveillance. Research often focuses on building sophisticated models to understand biological processes, uncovering complex structures in high-dimensional health data, predicting patient outcomes and population trends, and quantifying uncertainty in decision making.

Faculty: Sangwon Hyun, Athanasios Kottas, Ju Hee Lee, Richard Li, Raquel Prado

Environmental and Ecological Sciences

Statistical models and methods help us understand complex environmental and ecological systems, from climate dynamics and earthquakes to species distributions. Researchers in our department develop advanced models to extract insights from noisy, high-dimensional, and spatio-temporal data, addressing a wide range of challenges, such as forecasting environmental changes and climate events, estimating marine microbial abundance, and predicting soil carbon levels.

Faculty: Marcela Alfaro Córdoba, Sangwon Hyun, Athanasios Kottas, Ju Hee Lee, Robert Lund, Paul Parker, Raquel Prado, Bruno Sansó

Open Science and Data Science Education

Faculty in the department promote data science education inside and outside of the classroom. They engage in broader data science communities and train researchers from other disciplines in data science research through workshops and online training materials. Much of the department’s research also focuses on creating, disseminating, and maintaining user-friendly open-source software for the broader scientific community.

Faculty: Marcela Alfaro Córdoba, Sangwon Hyun, Richard Li, Bruno Sansó