Srinadh Bhojanapalli

srinadh 

I am a staff research scientist at Google research in New York. My main research focus these days is on Efficient Long Context methods for Gemini models focusing on 1) Efficient attention (Sparse Transformers, Treeformer) and 2) Long context generalization (FIRE). In particular, I am excited about our recent works enabling Transformers to achieve SOTA length generalize on Arithmetic tasks (Position Coupling, CoT).

Earlier I was a research assistant professor at TTI Chicago. I obtained my PhD in ECE at The University of Texas at Austin where I was advised by Prof. Sujay Sanghavi. Before coming to UT, I've spent four wonderful years pursuing my undergraduate studies at Indian Institute of Technology Bombay.

Contact: bsrinadh [at] google [dot] com.

News

  • November 2024 - Neurips 2024 accepted our paper on position coupling that improves length generalization of Transformer on addition upto 500 digits!

  • November 2024 - 2 new exciting papers.

    • Multi-level coupling and CoT that enable Transformers to length generalize on multi operand addition and 2 operand multiplication for the first time!

    • Mimetic Initialization for Mamba SSMs that enable length generalization properties on copying style tasks showing Mamba models with standard state sizes suffer more from training challenges than expressiveness.

  • July 2024 - We have a new paper introducing position coupling that improves length generalization of Transformer on addition upto 500 digits! This will be presented at the LCFM workshop at ICML 2024.

  • May 2024 - I will be attending ICLR 2024. Ping me if you will be around, will be great to meetup!

  • Feb 2024 - Visiting UT Austin to present our latest work on Long Context Foundational Models.

  • 2 papers accepted to ICLR 2024.

Current/former Interns.

  • Asher Trockman (Grad student at CMU)

  • Shanda Li (Grad student at CMU)

  • Samy Jelassi (Grad student at Princeton)

  • Zhiyuan Li (Grad student at Princeton)

  • Wei Hu (Asst. Prof. at U. Michigan)

  • Chen Zhu (Research Scientist at Google)

  • Jingzhao Zhang (Asst. Prof. at Tsinghua)

  • Chulhee “Charlie” Yun (Asst. Prof. at KAIST)

  • Ayush Sekhari (Postdoc at MIT)

Papers

  • Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
    Hanseul Cho, Jaeyoung Cha, Srinadh Bhojanapalli, Chulhee Yun
    [Arxiv]

  • Mimetic Initialization Helps State Space Models Learn to Recall
    Asher Trockman, Hrayr Harutyunyan, J. Zico Kolter, Sanjiv Kumar, Srinadh Bhojanapalli
    [Arxiv]

  • Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers
    Hanseul Cho, Jaeyoung Cha, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, Chulhee Yun
    Neurips 2024
    Long Context Foundation Models Workshop at ICML 2024
    [Arxiv]

  • Efficient Language Model Architectures for Differentially Private Federated Learning
    Jae Hun Ro, Srinadh Bhojanapalli, Zheng Xu, Yanxiang Zhang, Ananda Theertha Suresh
    Privacy Regulation and Protection in Machine Learning Workshop at ICLR 2024
    [Arxiv]

  • HiRE: High Recall Approximate Top-k Estimation for Efficient LLM Inference
    Yashas Samaga B L, Varun Yerram, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli
    [Arxiv]

  • Functional Interpolation for Relative Positions Improves Long Context Transformers
    Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli
    ICLR 2024
    [Arxiv]

  • Efficacy of Dual-Encoders for Extreme Multi-Label Classification
    Nilesh Gupta, Devvrit Khatri, Ankit S Rawat, Srinadh Bhojanapalli, Prateek Jain, Inderjit S Dhillon
    ICLR 2024
    [Arxiv]

  • Depth Dependence of muP Learning Rates in ReLU MLPs
    Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J Reddi, Srinadh Bhojanapalli, Sanjiv Kumar
    [Arxiv]

  • On student-teacher deviations in distillation: does it pay to disobey?
    Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar
    Neurips 2023
    [Arxiv]

  • Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
    Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar
    ICLR 2023
    [Arxiv]

  • Treeformer: Dense Gradient Trees for Efficient Attention Computation
    Lovish Madaan, Srinadh Bhojanapalli, Himanshu Jain, Prateek Jain
    ICLR 2023
    [Arxiv]

  • On the Adversarial Robustness of Mixture of Experts
    Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme Ruiz, Pranjal Awasthi, Srinadh Bhojanapalli
    Neurips 2022
    [Arxiv]

  • Teacher's pet: understanding and mitigating biases in distillation
    Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
    TMLR 2022
    [Arxiv]

  • Robust Training of Neural Networks Using Scale Invariant Architectures
    Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank Reddi, Sanjiv Kumar
    ICML 2022
    [Arxiv]

  • Leveraging redundancy in attention with Reuse Transformers
    Srinadh Bhojanapalli*, Ayan Chakrabarti*, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar
    Preprint 2021
    [Arxiv]

  • Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation [*]
    Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit
    Preprint 2021
    [Arxiv]

  • Demystifying the Better Performance of Position Encoding Variants for Transformer
    Pu-Chin Chen*, Henry Tsai*, Srinadh Bhojanapalli*, Hyung Won Chung, Yin-Wen Chang, Chun-Sung Ferng
    EMNLP 2021
    [Arxiv]

  • Understanding Robustness of Transformers for Image Classification [*]
    Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner, Daliang Li, Thomas Unterthiner, Andreas Veit
    ICCV 2021
    [Arxiv]

  • On the Reproducibility of Neural Network Predictions
    Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar
    Preprint 2021
    [Arxiv]

  • Modifying Memories in Transformer Models
    Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar
    Preprint 2020
    [Arxiv]

  • Coping with Label Shift via Distributionally Robust Optimisation
    Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra
    ICLR 2021
    [Openreview], [Arxiv]

  • An efficient nonconvex reformulation of stagewise convex optimization problems
    Rudy Bunel, Oliver Hinder, Srinadh Bhojanapalli, Krishnamurthy (Dj)Dvijotham
    Neurips 2020
    [Arxiv]

  • Semantic label smoothing for sequence to sequence problems
    Michal Lukasik, Himanshu Jain, Aditya Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu and Sanjiv Kumar
    EMNLP 2020
    [Arxiv]

  • O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
    Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
    Neurips 2020
    [Arxiv].

  • Does label smoothing mitigate label noise?
    Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
    ICML 2020
    [Arxiv].

  • Low-Rank Bottleneck in Multi-head Attention Models
    Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
    ICML 2020
    [Arxiv].

  • Are Transformers universal approximators of sequence-to-sequence functions?
    Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
    ICLR 2020
    [Openreview].

  • Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
    Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh
    ICLR 2020
    [Openreview], [Arxiv].

  • The role of over-parametrization in generalization of neural networks
    Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro
    ICLR 2019
    [Openreview], [Arxiv].

  • Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form [*]
    Srinadh Bhojanapalli, Nicolas Boumal, Prateek Jain, Praneeth Netrapalli
    COLT 2018
    [Arxiv].

  • A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
    Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro
    ICLR 2018
    [Arxiv].

  • Exploring Generalization in Deep Learning
    Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro
    NIPS 2017
    [Arxiv].

  • Implicit Regularization in Matrix Factorization
    Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
    NIPS 2017

  • Stabilizing GAN Training with Multiple Random Projections
    Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti
    preprint 2017, [Arxiv], [Project website]. [Arxiv], [slides].

  • Single Pass PCA of Matrix Products
    Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alex Dimakis
    NIPS 2016
    [Arxiv], [SPARK code].

  • Global Optimality of Local Search for Low Rank Matrix Recovery
    Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
    NIPS 2016
    [Arxiv].

  • Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems
    Dohyung Park, Anastasios Kyrillidis, Srinadh Bhojanapalli, Constantine Caramanis, Sujay Sanghavi
    preprint 2016, [Arxiv].

  • Dropping Convexity for Faster Semi-definite Optimization
    Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi
    COLT 2016
    [Arxiv], [COLT], [slides].

  • A New Sampling Technique for Tensors
    Srinadh Bhojanapalli, Sujay Sanghavi
    preprint 2015, [Arxiv], [slides].

  • Tighter Low-rank Approximation via Sampling the Leveraged Element
    Srinadh Bhojanapalli, Prateek Jain, Sujay Sanghavi
    SODA 2015
    [Arxiv], [SODA], [slides].

  • Completing any Low-rank Matrix, Provably
    Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward
    JMLR 2015
    [Arxiv], [JMLR].

  • Universal Matrix Completion
    Srinadh Bhojanapalli, Prateek Jain
    ICML 2014
    [Arxiv], [ICML], [slides], [video].

  • Coherent Matrix Completion
    Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward
    ICML 2014
    [Arxiv], [ICML], [slides], [video].

PhD Thesis

Large Scale Matrix Factorization with Guarantees: Sampling and Bi-linearity [pdf]
UT Austin, 2015.