At Google I work on profile guided optimizations for datacenter workloads. My current work focuses on profile guided optimizations for heap allocation in non-moving memory allocators. I’m interested in hardware-software co-design and participate in the RISC-V CTR TG discussion. I partner with academic researchers and have served on the program committee of CGO. Previously, my work on code layout was featured on Phoronix and discussed on HackerNews. As part of a larger team, I worked on Propeller, a post-link optimization framework. An academic paper describing our work was awarded distinguished paper at ASPLOS 2023. Most of the work I’ve done is part of open-source projects such as LLVM, Dynamorio and tcmalloc.
As a PhD student I worked with the Architecture Research Group at Simon Fraser University advised by Dr. Arrvindh Shriraman. I was also supervised by Dr. Nick Sumner and interned with Dr. Viji Srinivasan at IBM. My research can be broadly described as generalized methods for application specific hardware specialization. I have worked on cache memory systems, coherence protocols, workload characterization and application specific hardware specialization. The semiconductor industry specializes hardware for better performance and energy efficiency, but this creates challenges in deciding what to specialize and how to integrate specialized units. Current methods require manual effort to restructure workloads. My research focused on automated compiler techniques for specialization. I’ve developed program analysis techniques to address the problem and synthesized an accelerator workload suite to help researchers. My work is available as open-source software. I’ve also researched ways to reduce energy consumption from data movement and designed adaptive caching mechanisms. My academic research has been published at the following conferences: HPCA’18, HPCA’17, IISWC’16, MICRO’16, ICS’16, ISCA’15, ICS’15, ISCA’13, MICRO’12.
MICROAmoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory HierarchyIn 45th Annual IEEE/ACM International Symposium on Microarchitecture, Dec 2012
ISCAFusion: Design Tradeoffs in Coherent Cache Hierarchies for AcceleratorsIn 42nd Annual International Symposium on Computer Architecture, Dec 2015
HPCANeedle: Leveraging Program Analysis to Extract Accelerators from Whole ProgramsIn 23rd ACM International Conference on High Performance Computer Architecture, Feb 2017
ASPLOSPropeller: A Profile Guided, Relinking Optimizer for Warehouse-Scale ApplicationsIn 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Mar 2023Distinguished Paper Award