I am a principal research engineer at Amazon. I work on our NeMo-Megatron based distributed training stack that is used to train the Amazon Nova models on thousands of AI accelerators on AWS. My current areas of interests are distributed training, ML systems, architecture exploration and scaling laws for LLM pre-training.
In the past, I was part of AWS AI Labs where I used to work on large-scale diffusion models. Before that, I worked on few-shot learning, meta-learning and hyperparameter optimization methods for visual recognition tasks. Please see my publications for more details. I obtained my MS from Columbia University and BS from Jadavpur University, both in CS.
Selected Publications
The Amazon Nova Family of Models: Technical Report and Model Card (arXiv) — Amazon AGI. [paper]
On the Scalability of Diffusion-Based Text-to-Image Generation (CVPR 2024) — Hao Li, Yang Zou, Ying Wang, Orchid Majumder, Yusheng Xie, R Manmatha, Ashwin Swaminathan, Zhuowen Tu, Stefano Ermon, Stefano Soatto. [paper]