Research Staff Member, AI Hardware, IBM Thomas J. Watson Research Center
Ph.D.   2017-2020, Computer Science, Georgia Institute of Technology
M.S.     2014-2017, Computer Science, Rice University
B.Tech. 2008-2012, Computer Science and Engineering, IIT Hyderabad
email: prasanth at ibm dot com
I’m currently a research staff member in the AI Hardware group
at IBM T.J. Watson Research Center focusing
on compilers and micro-architecture of next generation AI hardware technologies.
I recently finished my Ph.D. under the supervision of Vivek Sarkar and Jun Shirako in the Habanero Extreme Scale Software Research Laboratory at Georgia Tech in Atlanta, GA. My research focuses on advancing compiler optimizations for high-performance applications on general-purpose and domain-specific parallel architectures. In the last two years, I have focused on advancing compilers for mapping Deep Learning (DNN) operators onto flexible spatial accelerators, specialized SIMD units (e.g., Xilinx Versal AI Engine), and thread-migratory architecture (e.g., EMU). In the past, I focused on enhancing traditional compilation techniques for both sequential and explicitly parallel programs for performance optimizations and debugging on modern general-purpose architectures (e.g., Multi-core CPUs, SIMD units, and GPUs).
Nov 2020 -- current: Research Staff Member at IBM Research
Summer 2019: Machine Learning & Compiler Research Intern in Kees Visser's group at Xilinx Research labs for Xilinx Versal architecture
Summer 2018: Multi-Core Heterogeneous Compiler Intern in Vinod Kathail's group on compilers for Xilinx Versal architecture .
May 2012 - Aug 2014: Software engineer at Microsoft, Hyderabad.
This line of work focuses on developing cost models and compiler technologies for optimizing rapidly emerging DNN operators (building blocks of deep learning models, e.g., CONV2D, GEMM) on to flexible spatial architectures and specialized SIMD units (e.g., Xilinx Versal AI Engine).
This project focuses on advancing compiler optimizations for graph analytics and sparse linear algebra on a thread migratory architecture (EMU) introduced for weak-locality applications.
This project focuses on systematic integration of multiple storage transformations (e.g., renaming techniques) with loop transformations in a single framework to coordinate their benefits for optimizing programs for higher performance.
This project is motivated by the observation that software with explicit parallelism is on the rise. Our work focuses on extending compiler analysis techniques to debug and optimize explicitly- parallel programs (loop-level, task-level, and SPMD-style).