Principal component analysis for network-valued data: The PCAN algorithm and its properties

James Wilson
Assistant Professor
Psychiatry and Biostatistics, University of Pittsburgh
Professor Sheng Jiang

Join us on Zoom:

Description:  In the last decade, network representation learning (NRL) has become a common and important learning task for network-valued data. The goal of network representation learning is to identify low-dimensional representations of an observed network that preserve various aspects of the original graph like its topology, vertex attributes, or community structure. In this talk, I consider the problem of interpretable NRL for samples of network- valued data. We propose the Principal Component Analysis for Networks (PCAN) algorithm to identify statistically meaningful low-dimensional representations of a network sample via subgraph count statistics. The PCAN procedure provides an interpretable framework for which one can readily visualize, explore, and formulate predictive models for network samples. We furthermore introduce a fast sampling-based algorithm, sPCAN, which is significantly more computationally efficient than its counterpart, but still enjoys advantages of interpretability. We investigate the relationship between these two methods and analyze their large-sample properties under the common regime where the sample of networks is a collection of kernel-based random graphs. We show that under this regime, the embeddings of the sPCAN method enjoy a central limit theorem and more- over that the population level embeddings of PCAN and sPCAN are equivalent. We assess PCAN’s ability to visualize, cluster, and classify observations in network samples arising in nature, including functional connectivity network samples and dynamic networks describing the political co-voting habits of the U.S. Senate. Our analyses reveal that our proposed algorithm provides informative and discriminatory features describing the networks in each sample.

Speaker Bio:  I am an Assistant Professor of Psychiatry and Biostatistics at the University of Pittsburgh, and Director of Experimental Design and Data Analysis for the Translational Neuroscience Program. Before Pitt, I was an Associate Professor of Statistics and Data Science at the University of San Francisco. I have a broad background in statistics, biostatistics, and data science with interests in modeling and analyzing complex network data, the development of interpretable unsupervised machine learning techniques, and the modeling and analysis of social and neuroimaging data. Using computational statistics, probability tools from random graph theory, and machine learning techniques, I work to provide simple and interpretable machinery to model, explore, and analyze interacting network-valued systems. I aim to make sense of complex data like that arising in functional connectivity and social media, while furthermore demystifying complex models like contemporary deep learning models on networks. I am particularly interested in understanding the interplay between social dynamics, neuro-biological systems, behavior, and disease.