Skip to main content

Showing 1–50 of 120 results for author: Ramchandran, K

.
  1. arXiv:2404.08335  [pdf, other

    cs.CL cs.LG

    Toward a Theory of Tokenization in LLMs

    Authors: Nived Rajaraman, Jiantao Jiao, Kannan Ramchandran

    Abstract: While there has been a large body of research attempting to circumvent tokenization for language modeling (Clark et al., 2022; Xue et al., 2022), the current consensus is that it is a necessary initial step for designing state-of-the-art performant language models. In this paper, we investigate tokenization from a theoretical point of view by studying the behavior of transformers on simple data ge… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 58 pages, 10 figures

  2. arXiv:2402.02631  [pdf, other

    cs.LG

    Learning to Understand: Identifying Interactions via the Möbius Transform

    Authors: Justin S. Kang, Yigit E. Erginbas, Landon Butler, Ramtin Pedarsani, Kannan Ramchandran

    Abstract: One of the key challenges in machine learning is to find interpretable representations of learned functions. The Möbius transform is essential for this purpose, as its coefficients correspond to unique importance scores for sets of input variables. This transform is closely related to widely used game-theoretic notions of importance like the Shapley and Bhanzaf value, but it also captures crucial… ▽ More

    Submitted 15 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 34 pages, 16 figures

  3. arXiv:2312.07930  [pdf, other

    cs.LG cs.CL cs.CR cs.IT stat.ML

    Towards Optimal Statistical Watermarking

    Authors: Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

    Abstract: We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the… ▽ More

    Submitted 6 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  4. arXiv:2310.00212  [pdf, other

    cs.LG cs.AI cs.CL

    Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

    Authors: Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhao** Wen, Kannan Ramchandran, Jiantao Jiao

    Abstract: Large Language Models (LLMs) can acquire extensive world knowledge through pre-training on large corpora. However, due to exposure to low-quality data, LLMs may exhibit harmful behavior without aligning with human values. The dominant approach for steering LLMs towards beneficial behavior involves Reinforcement Learning with Human Feedback (RLHF), with Proximal Policy Optimization (PPO) serving as… ▽ More

    Submitted 9 October, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: 19 pages, 5 figures

  5. arXiv:2303.14795  [pdf, other

    eess.IV eess.SP

    MRI Reconstruction with Side Information using Diffusion Models

    Authors: Brett Levac, Ajil Jalal, Kannan Ramchandran, Jonathan I. Tamir

    Abstract: Magnetic resonance imaging (MRI) exam protocols consist of multiple contrast-weighted images of the same anatomy to emphasize different tissue properties. Due to the long acquisition times required to collect fully sampled k-space measurements, it is common to only collect a fraction of k-space for each scan and subsequently solve independent inverse problems for each image contrast. Recently, the… ▽ More

    Submitted 6 June, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

  6. arXiv:2303.11453  [pdf, other

    cs.LG stat.ML

    Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing

    Authors: Nived Rajaraman, Devvrit, Aryan Mokhtari, Kannan Ramchandran

    Abstract: Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. In fact, several practical studies have shown that if a pruned model is fine-tuned with some gradient-based updates it generalizes well to new samples. Although the above pipeline, which we refer to as pruning + fine-tuning, has been extremely successful in lowering the… ▽ More

    Submitted 4 June, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 49 pages, 2 figures

  7. arXiv:2302.06025  [pdf, ps, other

    stat.ML cs.IT cs.LG math.ST

    Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

    Authors: Nived Rajaraman, Yanjun Han, Jiantao Jiao, Kannan Ramchandran

    Abstract: We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action. Compared with the linear model, two curious phenomena arise in non-linear models: first, in addition to the "learning phase" with a standard parametric rate for estimation or regret, there is an "burn-in period" with a fixed cost determined by the non-linear function; second, ac… ▽ More

    Submitted 9 January, 2024; v1 submitted 12 February, 2023; originally announced February 2023.

    Comments: Revised Section 3 and added an upper bound agnostic to the link function $f$

  8. arXiv:2301.13336  [pdf, other

    cs.LG cs.CR cs.GT

    The Fair Value of Data Under Heterogeneous Privacy Constraints in Federated Learning

    Authors: Justin Kang, Ramtin Pedarsani, Kannan Ramchandran

    Abstract: Modern data aggregation often involves a platform collecting data from a network of users with various privacy options. Platforms must solve the problem of how to allocate incentives to users to convince them to share their data. This paper puts forth an idea for a \textit{fair} amount to compensate users for their data at a given privacy level based on an axiomatic definition of fairness, along t… ▽ More

    Submitted 4 February, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: 29 pages, 5 figures, Accepted to TMLR

  9. arXiv:2301.06200  [pdf, other

    eess.SP cs.LG

    Efficiently Computing Sparse Fourier Transforms of $q$-ary Functions

    Authors: Yigit Efe Erginbas, Justin Singh Kang, Amirali Aghazadeh, Kannan Ramchandran

    Abstract: Fourier transformations of pseudo-Boolean functions are popular tools for analyzing functions of binary sequences. Real-world functions often have structures that manifest in a sparse Fourier transform, and previous works have shown that under the assumption of sparsity the transform can be computed efficiently. But what if we want to compute the Fourier transform of functions defined over a $q$-a… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

    Comments: 29 pages, 3 figures

  10. arXiv:2212.06891  [pdf, other

    cs.LG cs.GT

    Interactive Learning with Pricing for Optimal and Stable Allocations in Markets

    Authors: Yigit Efe Erginbas, Soham Phade, Kannan Ramchandran

    Abstract: Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback. As a principled way of incorporating market constraints and user incentives in the design, we consider our objectives to be two-fold: maximal social welfare with minimal instability. To maximize social welfare, our proposed… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2207.04143

  11. arXiv:2210.02604  [pdf, other

    stat.ML cs.LG

    Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces

    Authors: Amirali Aghazadeh, Nived Rajaraman, Tony Tu, Kannan Ramchandran

    Abstract: Data-driven machine learning models are being increasingly employed in several important inference problems in biology, chemistry, and physics which require learning over combinatorial spaces. Recent empirical evidence (see, e.g., [1], [2], [3]) suggests that regularizing the spectral representation of such models improves their generalization power when labeled data is scarce. However, despite th… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  12. arXiv:2207.04143  [pdf, other

    cs.LG cs.GT cs.IR

    Interactive Recommendations for Optimal Allocations in Markets with Constraints

    Authors: Yigit Efe Erginbas, Soham Phade, Kannan Ramchandran

    Abstract: Recommendation systems when employed in markets play a dual role: they assist users in selecting their most desired items from a large pool and they help in allocating a limited number of items to the users who desire them the most. Despite the prevalence of capacity constraints on allocations in many real-world recommendation settings, a principled way of incorporating them in the design of these… ▽ More

    Submitted 28 July, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

  13. arXiv:2206.10341  [pdf, other

    cs.CR cs.AI cs.LG

    Neurotoxin: Durable Backdoors in Federated Learning

    Authors: Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Michael W. Mahoney, Joseph E. Gonzalez, Kannan Ramchandran, Prateek Mittal

    Abstract: Due to their decentralized nature, federated learning (FL) systems have an inherent vulnerability during their training to adversarial backdoor attacks. In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model's outputs can be fixed to a given target for certain inputs. (As a simple toy exam… ▽ More

    Submitted 12 June, 2022; originally announced June 2022.

    Comments: Appears in ICML 2022

  14. arXiv:2206.00120  [pdf, other

    stat.ML cs.IT cs.LG

    Decentralized Competing Bandits in Non-Stationary Matching Markets

    Authors: Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran, Tara Javidi, Arya Mazumdar

    Abstract: Understanding complex dynamics of two-sided online matching markets, where the demand-side agents compete to match with the supply-side (arms), has recently received substantial interest. To that end, in this paper, we introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments. We adhere to the serial dictatorship setting, where the demand-side a… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

  15. arXiv:2205.15397  [pdf, other

    cs.LG stat.ML

    Minimax Optimal Online Imitation Learning via Replay Estimation

    Authors: Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

    Abstract: Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap tha… ▽ More

    Submitted 14 January, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  16. arXiv:2202.02842  [pdf, other

    cs.CL cs.LG

    Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

    Authors: Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney

    Abstract: Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strong… ▽ More

    Submitted 4 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Journal ref: Proceedings of the 29th ACM SIGKDD international conference on knowledge discovery and data mining (2023)

  17. arXiv:2107.11228  [pdf, other

    cs.LG

    Taxonomizing local versus global structure in neural network loss landscapes

    Authors: Yaoqing Yang, Liam Hodgkinson, Ryan Theisen, Joe Zou, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

    Abstract: Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper. Among other things, local metrics (such as the smoothness of the loss landscape) have been shown to correlate with global properties of the model (such as good generalization performance).… ▽ More

    Submitted 12 December, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

    Journal ref: Thirty-fifth Annual Conference on Neural Information Processing Systems, 2021

  18. arXiv:2107.05849  [pdf, ps, other

    stat.ML cs.IT cs.LG

    Model Selection for Generic Reinforcement Learning

    Authors: Avishek Ghosh, Sayak Ray Chowdhury, Kannan Ramchandran

    Abstract: We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel $P^*$ belongs to a family of models $\mathcal{P}^*$ with finite metric entropy. In the model selection framework, instead of $\mathcal{P}^*$, we are given $M$ nested families of transition kernels $\cP_1 \subset \cP_2 \subset \ldots \subset \cP_M$. We propose an… ▽ More

    Submitted 9 December, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

  19. arXiv:2107.03455  [pdf, ps, other

    stat.ML cs.IT cs.LG

    Model Selection for Generic Contextual Bandits

    Authors: Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

    Abstract: We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption. We propose a successive refinement based algorithm called Adaptive Contextual Bandit ({\ttfamily ACB}), that works in phases and successively eliminates model classes that are too simple to fit the given instance. We prove that this algorithm is adaptive, i.e., the regret ra… ▽ More

    Submitted 20 July, 2023; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: Accepted at IEEE Transactions on Information Theory. arXiv admin note: text overlap with arXiv:2006.02612

  20. arXiv:2106.08902  [pdf, other

    stat.ML cs.LG

    Adaptive Clustering and Personalization in Multi-Agent Stochastic Linear Bandits

    Authors: Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

    Abstract: We consider the problem of minimizing regret in an $N$ agent heterogeneous stochastic linear bandits framework, where the agents (users) are similar but not all identical. We model user heterogeneity using two popularly used ideas in practice; (i) A clustering framework where users are partitioned into groups with users in the same group being identical to each other, but different across groups,… ▽ More

    Submitted 2 February, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: 25 pages, 8 figures

  21. arXiv:2105.07320  [pdf, other

    cs.DC stat.ML

    LocalNewton: Reducing Communication Bottleneck for Distributed Learning

    Authors: Vipul Gupta, Avishek Ghosh, Michal Derezinski, Rajiv Khanna, Kannan Ramchandran, Michael Mahoney

    Abstract: To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the worke… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: To be published in Uncertainty in Artificial Intelligence (UAI) 2021

  22. arXiv:2103.09424  [pdf, other

    cs.DC cs.LG math.OC stat.ML

    Esca** Saddle Points in Distributed Newton's Method with Communication Efficiency and Byzantine Resilience

    Authors: Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar, Kannan Ramchandran

    Abstract: The problem of saddle-point avoidance for non-convex optimization is quite challenging in large scale distributed learning frameworks, such as Federated Learning, especially in the presence of Byzantine workers. The celebrated cubic-regularized Newton method of \cite{nest} is one of the most elegant ways to avoid saddle-points in the standard centralized (non-distributed) setup. In this paper, we… ▽ More

    Submitted 25 December, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

  23. arXiv:2102.12948  [pdf, ps, other

    cs.LG stat.ML

    Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

    Authors: Nived Rajaraman, Yanjun Han, Lin F. Yang, Kannan Ramchandran, Jiantao Jiao

    Abstract: We study the statistical limits of Imitation Learning (IL) in episodic Markov Decision Processes (MDPs) with a state space $\mathcal{S}$. We focus on the known-transition setting where the learner is provided a dataset of $N$ length-$H$ trajectories from a deterministic expert policy and knows the MDP transition. We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using t… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: 30 pages, 2 figures

  24. arXiv:2010.13829  [pdf, other

    cs.LG

    BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature Selection in Sublinear Memory

    Authors: Amirali Aghazadeh, Vipul Gupta, Alex DeWeese, O. Ozan Koyluoglu, Kannan Ramchandran

    Abstract: We consider feature selection for applications in machine learning where the dimensionality of the data is so large that it exceeds the working memory of the (local) computing machine. Unfortunately, current large-scale sketching algorithms show poor memory-accuracy trade-off due to the irreversible collision and accumulation of the stochastic gradient noise in the sketched domain. Here, we develo… ▽ More

    Submitted 26 May, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

  25. arXiv:2010.08899  [pdf, other

    cs.LG cs.DC stat.ML

    Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism

    Authors: Vipul Gupta, Dhruv Choudhary, ** Tak Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Kannan Ramchandran, Michael W. Mahoney

    Abstract: In this paper, we consider hybrid parallelism -- a paradigm that employs both Data Parallelism (DP) and Model Parallelism (MP) -- to scale distributed training of large recommendation models. We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training. DCT filters the entities to be communicated across the network through a simple… ▽ More

    Submitted 21 May, 2021; v1 submitted 17 October, 2020; originally announced October 2020.

    Comments: 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)

  26. arXiv:2010.00217  [pdf, other

    cs.CR

    CoVer: Collaborative Light-Node-Only Verification and Data Availability for Blockchains

    Authors: Steven Cao, Swanand Kadhe, Kannan Ramchandran

    Abstract: Validating a blockchain incurs heavy computation, communication, and storage costs. As a result, clients with limited resources, called light nodes, cannot verify transactions independently and must trust full nodes, making them vulnerable to security attacks. Motivated by this problem, we ask a fundamental question: can light nodes securely validate without any full nodes? We answer affirmatively… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: IEEE Blockchain 2020

  27. arXiv:2009.11248  [pdf, other

    cs.CR cs.IT cs.LG stat.ML

    FastSecAgg: Scalable Secure Aggregation for Privacy-Preserving Federated Learning

    Authors: Swanand Kadhe, Nived Rajaraman, O. Ozan Koyluoglu, Kannan Ramchandran

    Abstract: Recent attacks on federated learning demonstrate that kee** the training data on clients' devices does not provide sufficient privacy, as the model parameters shared by clients can leak information about their training data. A 'secure aggregation' protocol enables the server to aggregate clients' models in a privacy-preserving manner. However, existing secure aggregation protocols incur high com… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Comments: Shorter version accepted in ICML Workshop on Federated Learning, July 2020, and CCS Workshop on Privacy-Preserving Machine Learning in Practice, November 2020

  28. arXiv:2008.07793  [pdf, other

    cs.DC cs.GT

    Utility-based Resource Allocation and Pricing for Serverless Computing

    Authors: Vipul Gupta, Soham Phade, Thomas Courtade, Kannan Ramchandran

    Abstract: Serverless computing platforms currently rely on basic pricing schemes that are static and do not reflect customer feedback. This leads to significant inefficiencies from a total utility perspective. As one of the fastest-growing cloud services, serverless computing provides an opportunity to better serve both users and providers through the incorporation of market-based strategies for pricing and… ▽ More

    Submitted 24 January, 2022; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: 31 pages, 10 figures

  29. arXiv:2007.05086  [pdf, other

    cs.LG stat.ML

    Boundary thickness and robustness in learning models

    Authors: Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

    Abstract: Robustness of machine learning models to various adversarial and non-adversarial corruptions continues to be of interest. In this paper, we introduce the notion of the boundary thickness of a classifier, and we describe its connection with and usefulness for model robustness. Thick decision boundaries lead to improved performance, while thin decision boundaries lead to overfitting (e.g., measured… ▽ More

    Submitted 12 January, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Journal ref: NeurIPS 2020

  30. arXiv:2006.04088  [pdf, other

    stat.ML cs.LG

    An Efficient Framework for Clustered Federated Learning

    Authors: Avishek Ghosh, Jichan Chung, Dong Yin, Kannan Ramchandran

    Abstract: We address the problem of federated learning (FL) where users are distributed and partitioned into clusters. This setup captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task), they can leverage the strength in numbers in order to perform more efficient federated learning. For th… ▽ More

    Submitted 8 June, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

    Comments: Preliminary results appeared at NeurIPS 2020

  31. arXiv:2006.02612  [pdf, ps, other

    stat.ML cs.LG

    Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

    Authors: Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

    Abstract: We consider the problem of model selection for two popular stochastic linear bandit settings, and propose algorithms that adapts to the unknown problem complexity. In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i \in [K]$, is $μ_i+ \langle α_{i,t},θ^* \rangle $, with $α_{i,t} \in \mathbb{R}^d$ being the known context vector and $μ_i \in [-1,1]$ and… ▽ More

    Submitted 15 June, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: 24 pages, 8 figures

  32. arXiv:2005.07184  [pdf, other

    cs.IT cs.DC cs.LG

    Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning

    Authors: Swanand Kadhe, O. Ozan Koyluoglu, Kannan Ramchandran

    Abstract: Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations: delays caused by slow running machines called 'stragglers', and communication overheads. Recently, Ye and Abbe [ICML 2018] proposed a coding-theoretic paradigm to characterize a fundamental trade-off between computation load per worker,… ▽ More

    Submitted 14 May, 2020; originally announced May 2020.

    Comments: Shorter version accepted in 2020 IEEE International Symposium on Information Theory (ISIT)

  33. arXiv:2004.10914  [pdf, other

    stat.ML cs.LG

    Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

    Authors: Avishek Ghosh, Kannan Ramchandran

    Abstract: We address the problem of solving mixed random linear equations. We have unlabeled observations coming from multiple linear regressions, and each observation corresponds to exactly one of the regression models. The goal is to learn the linear regressors from the observations. Classically, Alternating Minimization (AM) (which is a variant of Expectation Maximization (EM)) is used to solve this prob… ▽ More

    Submitted 11 August, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

    Comments: Accepted for publication at AISTATS, 2020

  34. arXiv:2001.07490  [pdf, other

    cs.DC cs.IT

    Serverless Straggler Mitigation using Local Error-Correcting Codes

    Authors: Vipul Gupta, Dominic Carrano, Yaoqing Yang, Vaishaal Shankar, Thomas Courtade, Kannan Ramchandran

    Abstract: Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning and high-performance computing. The p… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

  35. arXiv:1911.09721  [pdf, other

    cs.LG cs.DC stat.ML

    Communication-Efficient and Byzantine-Robust Distributed Learning with Error Feedback

    Authors: Avishek Ghosh, Raj Kumar Maity, Swanand Kadhe, Arya Mazumdar, Kannan Ramchandran

    Abstract: We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of Yin et al.~\cite{dong}, which uses more complicated sche… ▽ More

    Submitted 14 August, 2021; v1 submitted 21 November, 2019; originally announced November 2019.

  36. arXiv:1906.12140  [pdf, ps, other

    cs.CR cs.DC cs.IT

    SeF: A Secure Fountain Architecture for Slashing Storage Costs in Blockchains

    Authors: Swanand Kadhe, Jichan Chung, Kannan Ramchandran

    Abstract: Full nodes, which synchronize the entire blockchain history and independently validate all the blocks, form the backbone of any blockchain network by playing a vital role in ensuring security properties. On the other hand, a user running a full node needs to pay a heavy price in terms of storage costs. E.g., the Bitcoin blockchain size has grown over 215GB, in spite of its low throughput. The ledg… ▽ More

    Submitted 28 June, 2019; originally announced June 2019.

  37. arXiv:1906.09255  [pdf, ps, other

    stat.ML cs.IT cs.LG math.ST

    Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation

    Authors: Avishek Ghosh, Ashwin Pananjady, Adityanand Guntuboyina, Kannan Ramchandran

    Abstract: Max-affine regression refers to a model where the unknown regression function is modeled as a maximum of $k$ unknown affine functions for a fixed $k \geq 1$. This generalizes linear regression and (real) phase retrieval, and is closely related to convex regression. Working within a non-asymptotic framework, we study this problem in the high-dimensional setting assuming that $k$ is a fixed constant… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Comments: The first two authors contributed equally to this work and are ordered alphabetically

  38. arXiv:1906.06629  [pdf, other

    cs.LG stat.ML

    Robust Federated Learning in a Heterogeneous Environment

    Authors: Avishek Ghosh, Justin Hong, Dong Yin, Kannan Ramchandran

    Abstract: We study a recently proposed large-scale distributed learning paradigm, namely Federated Learning, where the worker machines are end users' own devices. Statistical and computational challenges arise in Federated Learning particularly in the presence of heterogeneous data distribution (i.e., data points on different devices belong to different distributions signifying different clusters) and Byzan… ▽ More

    Submitted 9 October, 2019; v1 submitted 15 June, 2019; originally announced June 2019.

    Comments: Fixing technical issues. Please discard any previous version

  39. arXiv:1905.03864  [pdf, other

    eess.AS cs.LG cs.SD

    Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion

    Authors: Orhan Ocal, Oguz H. Elibol, Gokce Keskin, Cory Stephenson, Anil Thomas, Kannan Ramchandran

    Abstract: We present a method for converting the voices between a set of speakers. Our method is based on training multiple autoencoder paths, where there is a single speaker-independent encoder and multiple speaker-dependent decoders. The autoencoders are trained with an addition of an adversarial loss which is provided by an auxiliary classifier in order to guide the output of the encoder to be speaker in… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

  40. arXiv:1904.13373  [pdf, ps, other

    cs.IT cs.DC cs.LG stat.ML

    Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers

    Authors: Swanand Kadhe, O. Ozan Koyluoglu, Kannan Ramchandran

    Abstract: Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, suffer from slow running machines, called 'stragglers'. Gradient coding is a coding-theoretic framework to mitigate stragglers by enabling the server to recover the gradient sum in the presence of stragglers. 'Approximate gradient codes' are variants of gradient codes t… ▽ More

    Submitted 30 April, 2019; originally announced April 2019.

    Comments: Shorter version accepted in 2019 IEEE International Symposium on Information Theory (ISIT)

  41. arXiv:1903.08857  [pdf, other

    cs.DC cs.IT cs.LG

    OverSketched Newton: Fast Convex Optimization for Serverless Systems

    Authors: Vipul Gupta, Swanand Kadhe, Thomas Courtade, Michael W. Mahoney, Kannan Ramchandran

    Abstract: Motivated by recent developments in serverless systems for large-scale computation as well as improvements in scalable randomized matrix algorithms, we develop OverSketched Newton, a randomized Hessian-based optimization algorithm to solve large-scale convex optimization problems in serverless systems. OverSketched Newton leverages matrix sketching ideas from Randomized Numerical Linear Algebra to… ▽ More

    Submitted 27 August, 2020; v1 submitted 21 March, 2019; originally announced March 2019.

    Comments: 37 pages, 12 figures

  42. arXiv:1901.08360  [pdf, other

    cs.LG stat.ML

    Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples

    Authors: Kamil Nar, Orhan Ocal, S. Shankar Sastry, Kannan Ramchandran

    Abstract: State-of-the-art neural networks are vulnerable to adversarial examples; they can easily misclassify inputs that are imperceptibly different than their training and test data. In this work, we establish that the use of cross-entropy loss function and the low-rank features of the training data have responsibility for the existence of these inputs. Based on this observation, we suggest that addressi… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

  43. arXiv:1811.02702  [pdf, other

    cs.LG stat.ML

    Greedy Frank-Wolfe Algorithm for Exemplar Selection

    Authors: Gary Cheng, Armin Askari, Kannan Ramchandran, Laurent El Ghaoui

    Abstract: In this paper, we consider the problem of selecting representatives from a data set for arbitrary supervised/unsupervised learning tasks. We identify a subset $S$ of a data set $A$ such that 1) the size of $S$ is much smaller than $A$ and 2) $S$ efficiently describes the entire data set, in a way formalized via convex optimization. In order to generate $|S| = k$ exemplars, our kernelizable algorit… ▽ More

    Submitted 22 February, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

  44. OverSketch: Approximate Matrix Multiplication for the Cloud

    Authors: Vipul Gupta, Shusen Wang, Thomas Courtade, Kannan Ramchandran

    Abstract: We propose OverSketch, an approximate algorithm for distributed matrix multiplication in serverless computing. OverSketch leverages ideas from matrix sketching and high-performance computing to enable cost-efficient multiplication that is resilient to faults and straggling nodes pervasive in low-cost serverless architectures. We establish statistical guarantees on the accuracy of OverSketch and em… ▽ More

    Submitted 21 February, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: Published in Proc. IEEE Big Data 2018. Updated version provides details of distributed sketching and highlights other advantages of OverSketch

    Journal ref: 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 298-304

  45. arXiv:1810.11914  [pdf, other

    cs.LG cs.CR cs.NE stat.ML

    Rademacher Complexity for Adversarially Robust Generalization

    Authors: Dong Yin, Kannan Ramchandran, Peter Bartlett

    Abstract: Many machine learning models are vulnerable to adversarial attacks; for example, adding adversarial perturbations that are imperceptible to humans can often make machine learning models produce wrong predictions with high confidence. Moreover, although we may obtain robust models on the training dataset via adversarial training, in some problems the learned models cannot generalize well to the tes… ▽ More

    Submitted 29 July, 2020; v1 submitted 28 October, 2018; originally announced October 2018.

    Comments: ICML 2019

  46. arXiv:1807.03379  [pdf, other

    cs.LG stat.ML

    Online Scoring with Delayed Information: A Convex Optimization Viewpoint

    Authors: Avishek Ghosh, Kannan Ramchandran

    Abstract: We consider a system where agents enter in an online fashion and are evaluated based on their attributes or context vectors. There can be practical situations where this context is partially observed, and the unobserved part comes after some delay. We assume that an agent, once left, cannot re-enter the system. Therefore, the job of the system is to provide an estimated score for the agent based o… ▽ More

    Submitted 9 July, 2018; originally announced July 2018.

    Comments: 8 pages, 4 figures

  47. arXiv:1807.02253  [pdf, other

    cs.DC cs.IT

    Faster Data-access in Large-scale Systems: Network-scale Latency Analysis under General Service-time Distributions

    Authors: Avishek Ghosh, Kannan Ramchandran

    Abstract: In cloud storage systems with a large number of servers, files are typically not stored in single servers. Instead, they are split, replicated (to ensure reliability in case of server malfunction) and stored in different servers. We analyze the mean latency of such a split-and-replicate cloud storage system under general sub-exponential service time. We present a novel scheduling scheme that utili… ▽ More

    Submitted 6 July, 2018; originally announced July 2018.

    Comments: 12 pages, 7 figures

  48. arXiv:1806.06766  [pdf, ps, other

    cs.DS eess.SY

    Matching Observations to Distributions: Efficient Estimation via Sparsified Hungarian Algorithm

    Authors: Sinho Chewi, Forest Yang, Avishek Ghosh, Abhay Parekh, Kannan Ramchandran

    Abstract: Suppose we are given observations, where each observation is drawn independently from one of $k$ known distributions. The goal is to match each observation to the distribution from which it was drawn. We observe that the maximum likelihood estimator (MLE) for this problem can be computed using weighted bipartite matching, even when $n$, the number of observations per distribution, exceeds one. Thi… ▽ More

    Submitted 29 September, 2019; v1 submitted 18 June, 2018; originally announced June 2018.

    Comments: 8 pages, 1 figure; to appear in the 57th Annual Allerton Conference on Communication, Control, and Computing

    MSC Class: 68W20

  49. arXiv:1806.06035  [pdf, other

    math.OC cs.MA eess.SY

    Customized Local Differential Privacy for Multi-Agent Distributed Optimization

    Authors: Roel Dobbe, Ye Pu, **gge Zhu, Kannan Ramchandran, Claire Tomlin

    Abstract: Real-time data-driven optimization and control problems over networks may require sensitive information of participating users to calculate solutions and decision variables, such as in traffic or energy systems. Adversaries with access to coordination signals may potentially decode information on individual users and put user privacy at risk. We develop local differential privacy, which is a stron… ▽ More

    Submitted 22 May, 2020; v1 submitted 15 June, 2018; originally announced June 2018.

    Comments: A shorter version of this paper will appear in the proceedings of ISGT-Europe 2020 in The Hague

  50. arXiv:1806.05358  [pdf, ps, other

    cs.LG cs.CR cs.DC math.OC stat.ML

    Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning

    Authors: Dong Yin, Yudong Chen, Kannan Ramchandran, Peter Bartlett

    Abstract: We study robust distributed learning that involves minimizing a non-convex loss function with saddle points. We consider the Byzantine setting where some worker machines have abnormal or even arbitrary and adversarial behavior. In this setting, the Byzantine machines may create fake local minima near a saddle point that is far away from any true local minimum, even when robust gradient estimators… ▽ More

    Submitted 29 July, 2020; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: ICML 2019