Srinadh Bhojanapalli

I am a staff research scientist in Gemini Team at Google DeepMind in New York. My main research focus these days is on Efficient Long Context methods for Gemini models focusing on 1) Efficient attention (Sparse Transformers, Treeformer) and 2) Long context generalization (FIRE). In particular, I am excited about our recent works enabling Transformers to achieve SOTA length generalize on Arithmetic tasks (Position Coupling, CoT).

Earlier I was a research assistant professor at TTI Chicago. I obtained my PhD in ECE at The University of Texas at Austin where I was advised by Prof. Sujay Sanghavi. Before coming to UT, I've spent four wonderful years pursuing my undergraduate studies at Indian Institute of Technology Bombay.

Contact: bsrinadh [at] google [dot] com.

News

November 2024 - Neurips 2024 accepted our paper on position coupling that improves length generalization of Transformer on addition upto 500 digits!
November 2024 - 2 new exciting papers.
- Multi-level coupling and CoT that enable Transformers to length generalize on multi operand addition and 2 operand multiplication for the first time!
- Mimetic Initialization for Mamba SSMs that enable length generalization properties on copying style tasks showing Mamba models with standard state sizes suffer more from training challenges than expressiveness.
July 2024 - We have a new paper introducing position coupling that improves length generalization of Transformer on addition upto 500 digits! This will be presented at the LCFM workshop at ICML 2024.
May 2024 - I will be attending ICLR 2024. Ping me if you will be around, will be great to meetup!
Feb 2024 - Visiting UT Austin to present our latest work on Long Context Foundational Models.
2 papers accepted to ICLR 2024.
- A novel position encoding approach FIRE to enable length generalization in Transformers.
- Scaling Dual encoder training to large scale XMC problems - Efficacy of Dual-Encoders for Extreme Multi-Label Classification

Current/former Interns.

Hanseul Cho (Grad student at KAIST)
Asher Trockman (Grad student at CMU)
Shanda Li (Grad student at CMU)
Samy Jelassi (Grad student at Princeton)
Zhiyuan Li (Grad student at Princeton)
Wei Hu (Asst. Prof. at U. Michigan)
Chen Zhu (Research Scientist at Google)
Jingzhao Zhang (Asst. Prof. at Tsinghua)
Chulhee “Charlie” Yun (Asst. Prof. at KAIST)
Ayush Sekhari (Postdoc at MIT)

Papers

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, LongContext, and Next Generation Agentic Capabilities
Gemini Team, Google
[geminiv25_report.pdf]
Spark Transformer: Reactivating Sparsity in FFN and Attention
Chong You, Kan Wu, Zhipeng Jia, Lin Chen, Srinadh Bhojanapalli, Jiaxian Guo, Utku Evci, Jan Wassenberg, Praneeth Netrapalli, Jeremiah J. Willcock, Suvinay Subramanian, Felix Chern, Alek Andreev, Shreya Pathak, Felix Yu, Prateek Jain, David E. Culler, Henry M. Levy, Sanjiv Kumar
[Arxiv]
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
Hanseul Cho, Jaeyoung Cha, Srinadh Bhojanapalli, Chulhee Yun
ICLR 2025
[Arxiv]
Mimetic Initialization Helps State Space Models Learn to Recall
Asher Trockman, Hrayr Harutyunyan, J. Zico Kolter, Sanjiv Kumar, Srinadh Bhojanapalli
Workshop on Weight Space Learning at ICLR 2025
[Arxiv]
Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers
Hanseul Cho, Jaeyoung Cha, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, Chulhee Yun
Neurips 2024
Long Context Foundation Models Workshop at ICML 2024
[Arxiv]
Efficient Language Model Architectures for Differentially Private Federated Learning
Jae Hun Ro, Srinadh Bhojanapalli, Zheng Xu, Yanxiang Zhang, Ananda Theertha Suresh
Privacy Regulation and Protection in Machine Learning Workshop at ICLR 2024
[Arxiv]
HiRE: High Recall Approximate Top-k Estimation for Efficient LLM Inference
Yashas Samaga B L, Varun Yerram, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli
[Arxiv]
Functional Interpolation for Relative Positions Improves Long Context Transformers
Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli
ICLR 2024
[Arxiv]
Efficacy of Dual-Encoders for Extreme Multi-Label Classification
Nilesh Gupta, Devvrit Khatri, Ankit S Rawat, Srinadh Bhojanapalli, Prateek Jain, Inderjit S Dhillon
ICLR 2024
[Arxiv]
Depth Dependence of muP Learning Rates in ReLU MLPs
Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J Reddi, Srinadh Bhojanapalli, Sanjiv Kumar
[Arxiv]
On student-teacher deviations in distillation: does it pay to disobey?
Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar
Neurips 2023
[Arxiv]
Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar
ICLR 2023
[Arxiv]
Treeformer: Dense Gradient Trees for Efficient Attention Computation
Lovish Madaan, Srinadh Bhojanapalli, Himanshu Jain, Prateek Jain
ICLR 2023
[Arxiv]
On the Adversarial Robustness of Mixture of Experts
Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme Ruiz, Pranjal Awasthi, Srinadh Bhojanapalli
Neurips 2022
[Arxiv]
Teacher's pet: understanding and mitigating biases in distillation
Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
TMLR 2022
[Arxiv]
Robust Training of Neural Networks Using Scale Invariant Architectures
Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank Reddi, Sanjiv Kumar
ICML 2022
[Arxiv]
Leveraging redundancy in attention with Reuse Transformers
Srinadh Bhojanapalli*, Ayan Chakrabarti*, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar
Preprint 2021
[Arxiv]
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation [*]
Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit
Preprint 2021
[Arxiv]
Demystifying the Better Performance of Position Encoding Variants for Transformer
Pu-Chin Chen*, Henry Tsai*, Srinadh Bhojanapalli*, Hyung Won Chung, Yin-Wen Chang, Chun-Sung Ferng
EMNLP 2021
[Arxiv]
Understanding Robustness of Transformers for Image Classification [*]
Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner, Daliang Li, Thomas Unterthiner, Andreas Veit
ICCV 2021
[Arxiv]
On the Reproducibility of Neural Network Predictions
Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar
Preprint 2021
[Arxiv]
Modifying Memories in Transformer Models
Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar
Preprint 2020
[Arxiv]
Coping with Label Shift via Distributionally Robust Optimisation
Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra
ICLR 2021
[Openreview], [Arxiv]
An efficient nonconvex reformulation of stagewise convex optimization problems
Rudy Bunel, Oliver Hinder, Srinadh Bhojanapalli, Krishnamurthy (Dj)Dvijotham
Neurips 2020
[Arxiv]
Semantic label smoothing for sequence to sequence problems
Michal Lukasik, Himanshu Jain, Aditya Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu and Sanjiv Kumar
EMNLP 2020
[Arxiv]
O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
Neurips 2020
[Arxiv].
Does label smoothing mitigate label noise?
Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
ICML 2020
[Arxiv].
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
ICML 2020
[Arxiv].
Are Transformers universal approximators of sequence-to-sequence functions?
Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
ICLR 2020
[Openreview].
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh
ICLR 2020
[Openreview], [Arxiv].
The role of over-parametrization in generalization of neural networks
Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro
ICLR 2019
[Openreview], [Arxiv].
Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form [*]
Srinadh Bhojanapalli, Nicolas Boumal, Prateek Jain, Praneeth Netrapalli
COLT 2018
[Arxiv].
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro
ICLR 2018
[Arxiv].
Exploring Generalization in Deep Learning
Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro
NIPS 2017
[Arxiv].
Implicit Regularization in Matrix Factorization
Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
NIPS 2017
Stabilizing GAN Training with Multiple Random Projections
Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti
preprint 2017, [Arxiv], [Project website]. [Arxiv], [slides].
Single Pass PCA of Matrix Products
Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alex Dimakis
NIPS 2016
[Arxiv], [SPARK code].
Global Optimality of Local Search for Low Rank Matrix Recovery
Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
NIPS 2016
[Arxiv].
Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems
Dohyung Park, Anastasios Kyrillidis, Srinadh Bhojanapalli, Constantine Caramanis, Sujay Sanghavi
preprint 2016, [Arxiv].
Dropping Convexity for Faster Semi-definite Optimization
Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi
COLT 2016
[Arxiv], [COLT], [slides].
A New Sampling Technique for Tensors
Srinadh Bhojanapalli, Sujay Sanghavi
preprint 2015, [Arxiv], [slides].
Tighter Low-rank Approximation via Sampling the Leveraged Element
Srinadh Bhojanapalli, Prateek Jain, Sujay Sanghavi
SODA 2015
[Arxiv], [SODA], [slides].
Completing any Low-rank Matrix, Provably
Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward
JMLR 2015
[Arxiv], [JMLR].
Universal Matrix Completion
Srinadh Bhojanapalli, Prateek Jain
ICML 2014
[Arxiv], [ICML], [slides], [video].
Coherent Matrix Completion
Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward
ICML 2014
[Arxiv], [ICML], [slides], [video].

PhD Thesis

Large Scale Matrix Factorization with Guarantees: Sampling and Bi-linearity [pdf]
UT Austin, 2015.