LLM infra performance and Hardware/Software Co-design: Conducted analysis and co-design on current and future AI accelerators for Google’s Large Scale Distributed AI Infrastructure, especially to accommodate emerging Large language Models (LLMs).
Software development for fleet reliability monitoring: Developed system to monitor silent data corruptions (SDC) on the production fleet.
Ph.D. Student
Yale University
Security of Cloud FPGAs, FPGA and hardware security, cloud infrastructures, hardware accelerator design and RTL development for ML and cryptography.
Software Engineer Intern
Google
Designed and implemented the support to run OCI containers in Borg (Google’s cluster manager that runs almost all jobs).