CROSS-sponsored project merges with major open source software framework Apache Arrow

Date
Apache Arrow + SkyHookDM
By: Melissa Weckerle

Skyhook Data Management (SkyhookDM), a UC Santa Cruz Center for Research in Open Source Software (CROSS)-sponsored incubator project, has been integrated into the latest 7.0.0 release of Apache Arrow, an open source software framework and ecosystem widely used in data science for accessing, processing, and communicating large table-based datasets. 

SkyhookDM, supported by CROSS members and by grants from the National Science Foundation (NSF) and the U.S. Department of Energy (DOE), was developed to greatly decrease the client workload involved in filtering and decoding large datasets in storage systems.  

“The goal of SkyhookDM is to reduce client-side resource utilization in terms of CPU, memory bandwidth, and network utilization by offloading data management and processing tasks to the storage layer,” said Jayjeet Chakraborty, computer science and engineering Ph.D. student who is standing in for project leader Jeff LeFevre while on bereavement leave.

SkyhookDM is a plugin for the widely used open source distributed storage system Ceph, created by UC Santa Cruz alumnus Sage Weil (Ph.D. ‘07, computer science and engineering). The plugin provides a data-management interface for large table-based datasets by building upon Ceph’s programmable storage features. Composed of three main components: the storage-layer, client-layer, and protocol—a function that supports back-end storage communication—SkyhookDM allows for enhanced efficiency of both hardware and software resources, reduces the time spent by clients on decoding files and filtering data, and saves CPU usage by scaling within the storage layer. 

CROSS researchers who contributed to this project include Baskin Engineering alums Ivo Jimenez (Ph.D. ‘19, computer science and engineering), Jeff LeFevre (Ph.D. ‘14, computer science and engineering), Michael Sevilla (Ph.D. ‘18, computer science and engineering), and Noah Watkins (Ph.D. ‘18, computer science and engineering).

To learn more about SkyhookDM and how to deploy Arrow with SkyhookDM enabled, visit this web page.