On a campus filled with exciting events and speakers, two stand out: Questions that Matter: Data and Democracy (January 29, 2019, 7pm at Kuumbwa Jazz Center) and the 2019 Faculty Research Lecture: Responsible Data Science (February 26, 2019, 7pm, Music Recital Hall). Both feature UC Santa Cruz Professor Lise Getoor, and both address the promises and challenges of data science, a field fast-becoming one of the strongest forces shaping society today -- and an area of expertise at the Baskin School of Engineering at UC Santa Cruz.
Getoor is a data science legend. She’s given keynotes at major conferences all over the world before crowds of thousands of experts in areas as diverse as artificial intelligence, statistics, and database system. Her lab created an open-source tool called Probabilistic Soft Logic, used for everything from energy disaggregation and hybrid recommender systems to the analysis of human trafficking.
“My research mixes tools from different areas,” Getoor told us. “Theoretical and mathematical. It makes use of logic and probability to model networks, and takes into account context and structure.”
Data science aims to extrapolate useful data from the huge amounts of information created by modern society. Context--as Getoor’s work has revealed--is crucial. Most approaches to analysing data require extracting information from one database and placing it another (such as a spreadsheet) a process which can flatten intricate structures within the databases that might have revealed important insights.
Getoor heads two major data science projects at UC Santa Cruz. In 2017, the National Science Foundation awarded Getoor and a group of other UC Santa Cruz computer scientists, statisticians, and mathematicians $1.5 million as part of the Transdisciplinary Research in Principles of Data Science (TRIPODS) program, which is an effort to develop the theoretical principles of the field. Getoor’s group looks at the challenges of incompleteness, uncertainty, and bias in large, heterogeneous sets of interconnected data.
“For a long time there’s been an active informal data science group on campus,” Getoor said. “Under TRIPODS there’s fascinating work being done: [Associate Dean] Abel Rodriguez and Raj Guhaniyogi use sophisticated Bayesian statistical models to answer social science questions, Abhradeep Thakurta works with differential privacy and biological data and genomic data, Sesh Commandur has been doing research on efficient estimation of graph properties and collaborated with companies including Twitter, Dimitris Achlipotas works on randomized algorithms while Daniele Venturi brings methods from uncertainty quantification and stochastic processes to the collaboration. Elsewhere, statistics Professor Raquel Prado does interesting work on MRIs and brain scanning, and several folks across BSOE do important research on temporal and spatial statistics.”
Getoor also directs the D3 Data Science Research Center at UC Santa Cruz, a collaboration between academia and industry designed to develop other open-source tools for collecting data, discovering patterns, and making decisions. Collaborations are also important across disciplines, particularly with respect to the ethical and social concerns.
“There’s been an explosion of interest in high stakes decision-making [involving data science],” Getoor said. “Some common examples are recidivism prediction, where there is an urgent need to address fairness and bias, and as well as other task like loans and automated hiring decisions. Perspective is important: we need to have conversations about privacy and ethics while we develop these powerful new algorithmic tools.”
As the risks and benefits of insights developed by data science spread through society, Getoor sees a growing need for data science literacy and recognition that there are serious limits to what artificial intelligence and algorithmic insights can have.
“Algorithms aren’t a magic bullet for all society’s ills,” she said. “Doing good data science requires a collaborative and curious outlook so you’re collaborating with the people who will be using and affected by the system, and at the same time understand the powers and limitations of data science. And it’s important to communicate that at all levels.”
She hopes her talks can become the beginning of broader conversation about data science within society.
Catch Lise Getoor’s talk at the Questions that Matter: Data and Democracy on January 29, 2019 at 7pm at the Kuumbwa Jazz Center and at the 2019 Faculty Lecture: Responsible Data Science on February 26.