Hyoukjun Kwon

Research Scientist Reality Labs

Meta

Description: Deep learning (DL) inference accelerators, specialized custom chips with hardware architectures tailored for DL inference, have emerged to provide high efficiency (i.e., computational performance and energy efficiency) for DL inferences. However, the specialization strategy has encountered challenges from the rapid evolution of DL models (hundreds of new models every month) and relatively slow hardware development cycles (1-2 years). A DL inference accelerator specialized for a set of models that existed one year ago when the chip was designed is difficult to achieve the same degree of efficiency for new models today when the chip is out for deployment as new models often include new DL operators and involve different tensor shapes even for the same operators. This implies that some degrees of flexibility in DL accelerators are desired for adaptivity, and the flexibility should be minimal not to damage the efficiency gained from specialization. In my research, I identified the capability to orchestrate data (i.e., scheduling data movement and staging) within an accelerator is the key toward flexibility and explored ways to enable such flexibility with minimal extra hardware components. In this talk, I will first discuss how the capability for data orchestration provides flexibility and impacts the overall efficiency. I will use MAESTRO (MICRO 2019, Top Picks) to show the complex design space of DL models, DL accelerator architecture, and data orchestration (a.k.a. dataflow or compiler mapping) and show the potential benefits from flexibility. Next, I will discuss ways to enable data orchestration flexibility within a DL accelerator from reconfigurability (MAERI, ASPLOS 2018) and heterogeneity (Herald, HPCA 2021). Finally, I will discuss how I plan to extend the design space of computer systems considering the balance of flexibility and specialization to enable (1) heterogeneous AI on things (AIoT) systems and (2) future general-purpose computer systems with heterogeneous accelerators.

Speaker Bio: Hyoukjun Kwon is a research scientist at Reality Labs, Meta. He received his PhD degree in Computer Science from Georgia Institute of Technology in 2020. His primary research area is computer architecture. His main research interests include AI accelerator architecture, accelerator data flow optimization and modeling, interconnection network, AI model-compiler-accelerator co-design, and design automation. His flexible dataflow AI accelerator architecture, MAERI (ASPLOS 2018) received the honorable mention at IEEE Top Picks in Computer Architecture Conferences in 2018. His data-centric approach to model and analyze accelerator dataflow, MAESTRO, was selected as one of the IEEE Top Picks in Computer Architecture Conferences in 2019. His thesis work on the data-centric approach to design AI accelerators was recognized with an honorable mention at IEEE ACM SIGARCH/IEEE CS TCCA Outstanding Dissertation Award.



