Snehasish Kumar

about

At Google, I work on performance analysis and optimization, leading initiatives for compiler-driven profile-guided optimizations for data layout. My work has led to significant improvements in the throughput and latency of Google’s largest datacenter workloads. For example, my work on code layout was featured on Phoronix and discussed on HackerNews. I have contributed to Propeller, a post-link optimization framework, which was awarded distinguished paper at ASPLOS 2023. I’m deeply involved in hardware-software co-design and actively participate in the RISC-V community. As the Vice-Chair of the RISC-V Performance Analysis SIG, I contribute to various task groups and have made significant contributions to the Control Transfer Records Specification. Additionally, I’m actively contributing to other specifications initiated by the Performance Analysis SIG, such as Performance Event Sampling. Most of my work is open-sourced and available through projects like LLVM, Dynamorio and tcmalloc.

academic research

As a PhD student I have conducted research on cache memory systems, coherence protocols, workload characterization and application specific hardware specialization. The semiconductor industry specializes hardware for better performance and energy efficiency, but this creates challenges in deciding what to specialize and how to integrate specialized units. Current methods require manual effort to restructure workloads. My research focused on automated compiler techniques for specialization. I’ve developed program analysis techniques to address the problem and synthesized an accelerator workload suite to help researchers. I’ve also researched ways to reduce energy consumption from data movement and designed adaptive caching mechanisms. My academic research has been published at top tier conferences such as: HPCA’18, HPCA’17, IISWC’16, MICRO’16, ICS’16, ISCA’15, ICS’15, ISCA’13, MICRO’12.

selected publications

MICRO

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Snehasish Kumar, Hongzhou Zhao, Arrvindh Shriraman, E. Matthews, S. Dwarkadas, and L. Shannon

In 45th Annual IEEE/ACM International Symposium on Microarchitecture, Dec 2012

DOI PDF
ISCA

Fusion: Design Tradeoffs in Coherent Cache Hierarchies for Accelerators

Snehasish Kumar, Arrvindh Shriraman, and Naveen Vedula

In 42nd Annual International Symposium on Computer Architecture, Dec 2015

DOI PDF
HPCA

Needle: Leveraging Program Analysis to Extract Accelerators from Whole Programs

Snehasish Kumar, Nick Sumner, Vijayalakshmi Srinivasan, Steve Margerm, and Arrvindh Shriraman

In 23rd ACM International Conference on High Performance Computer Architecture, Feb 2017

DOI PDF
ASPLOS

Propeller: A Profile Guided, Relinking Optimizer for Warehouse-Scale Applications

Han Shen, Krzysztof Pszeniczny, Rahman Lavaee, Snehasish Kumar, Sriraman Tallam, and Xinliang (David) Li

In 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Mar 2023

Distinguished Paper Award

DOI PDF
CGO

Towards Threading the Needle of Debuggable Optimized Binaries

Cristian Assaiante, Simone Di Biasio, Snehasish Kumar, Giuseppe Antonio Di Luna, Daniele Cono D’Elia, and Leonardo Querzoni

In 23rd IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Mar 2026

DOI PDF
CC

RIFS: Run-Time Invariant Function Specialization

Saba Jamilan, Snehasish Kumar, and Heiner Litz

In 35th ACM SIGPLAN International Conference on Compiler Construction, Jan 2026

DOI PDF