NSF award will support project to promote reproducibility in computer science

Date
Carlos Maltzahn (photo by Sullivan Gaudreault).
Carlos Maltzahn (photo by Sullivan Gaudreault).
ecerf@ucsc.edu (Emily Cerf)
With the support of a three-year, $900,000 grant from the National Science Foundation (NSF), Adjunct Professor of Computer Science and Engineering Carlos Maltzahn and the UC Santa Cruz Center for Research in Open Source Software (CROSS) will participate in collaborative research to increase the reproducibility of computer science research. 

This grant comes from the inaugural year of an NSF initiative, called Findable Accessible Interoperable Reusable (FAIR) Open Science Research Coordination Networks (FAIROS RCN), to create groups of researchers who lead by example to promote open science results and artifacts. Overall, 10 new project groups were funded to pool $12.5 million into creating open source communities, which foster a vibrant exchange of artifacts within common infrastructure. 

“There's a huge shift going on,” said Maltzahn, director of UCSC CROSS. “I think it has to do with the realization of how much value the industry places on open source, and that open science and networks of expertise have to be more inclusive and involve stakeholders across academia, industry, government and open source communities. That becomes especially important when you talk about revitalizing U.S. high tech manufacturing.” 

Maltzahn will work with the Repeto project, a group focused on practical reproducibility in computer science research. Reproducibility allows researchers to verify findings, accelerate the research process to more quickly gain insights, and have their products more widely used in research labs, classrooms, and industry. It also helps students gain a deeper understanding of the original researcher’s thought process.

Involving researchers from the University of Chicago and New York University, the Repeto project strives to make the reproducibility of computer science practical – where many experiments can be repeated cost-effectively. Overall, they will create infrastructure, teach and mentor students, lead workshops, and create community best practices related to this goal.

Through these efforts, Maltzahn and his collaborators hope to better understand and foster the “market of reproducibility” to ensure that artifacts, such as pieces of software, are available for replication, but that those artifacts are both useful and used.

“The aim of Repeto is that both creating reproducible artifacts is really easy, and consuming those artifacts is really easy,” Maltzahn said. “The overall thought is that convenient reproducibility artifacts will accelerate the cycle of research, so you will get a much faster succession of insights and a powerful toolkit to improve student training.” 

UCSC’s role in this project will be to convene a world-wide program in 2023 called the “Summer of Reproducibility,” following the model of CROSS’s Open Source Research Experience program, which matches students with mentors working on open source projects. Similarly, for the Repeto project, undergraduate students participating in the Summer of Reproducibility will work to replicate a published piece of research.
 
This will allow students to gain a deeper understanding of the experiments they have repeated as compared to just reading about them, and allow the mentors to better understand what is needed in order for their work to be truly reproducible.  

Maltzahn will collaborate with Assistant Director of UCSC CROSS Stephanie Lieggi to put on the Summer of Reproducibility. He will collaborate with lead principal investigator Kate Keahey, senior computer scientist at Argonne National Lab and CASE Senior Scientist affiliated with the Department of Computer Science at the University of Chicago, Haryadi Gunawi, associate professor of computer science at University of Chicago, and Fraida Fund, research assistant professor in electrical and computer engineering at NYU.

The University of Chicago researchers will focus on building and maintaining the infrastructure to make practical reproducibility possible. They will also convene workshops on topics around reproducibility. NYU will focus on best practices for teaching and applying reproducibility in the classroom.

With connections made through researchers interested in reproducibility as part of the Association for Computer Machinery Emerging Interest Group for Reproducibility effort, Maltzahn and team have created an international steering committee for the project, involving people across disciplines including computer science, library science, and social science.

All of the 10 FAIROS RCN awardee groups are expected to work together in sharing artifacts and will have monthly meetings led by the program director. UC San Diego’s Supercomputer Center (SDSC) is also part of the cohort, making the UC system a major participant in this initiative. Members of the cohort such as the North Carolina Central University provide exciting outreach opportunities to students and faculty at Historically Black Colleges and Universities (HBCUs) and Minority-Serving Institutions (MCIs).