Skip to main content

Showing 1–50 of 87 results for author: Braverman, V

.
  1. KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

    Authors: Zirui Liu, Jiayi Yuan, Hongye **, Shaochen Zhong, Zhaozhuo Xu, Vladimir Braverman, Beidi Chen, Xia Hu

    Abstract: Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores attention keys and values to avoid re-computations, significantly increases memory demands and becomes the new bottleneck in speed and memory usage. This memory demand increases with larger batch sizes and longer context lengths. Addi… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  2. arXiv:2312.13385  [pdf, other

    cs.RO cs.LG

    ORBSLAM3-Enhanced Autonomous Toy Drones: Pioneering Indoor Exploration

    Authors: Murad Tukan, Fares Fares, Yotam Grufinkle, Ido Talmor, Loay Mualem, Vladimir Braverman, Dan Feldman

    Abstract: Navigating toy drones through uncharted GPS-denied indoor spaces poses significant difficulties due to their reliance on GPS for location determination. In such circumstances, the necessity for achieving proper navigation is a primary concern. In response to this formidable challenge, we introduce a real-time autonomous indoor exploration system tailored for drones equipped with a monocular \emph{… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  3. arXiv:2310.08391  [pdf, other

    stat.ML cs.LG

    How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

    Authors: **gfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett

    Abstract: Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a stati… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Camera Ready

  4. arXiv:2307.05834  [pdf, other

    cs.LG cs.AI

    Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

    Authors: Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang

    Abstract: Recently, DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents in adapting to new challenges. In this paper, we address this issue by conducting both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks without prior knowledge… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  5. arXiv:2307.04249  [pdf, other

    cs.DS

    Private Data Stream Analysis for Universal Symmetric Norm Estimation

    Authors: Vladimir Braverman, Joel Manning, Zhiwei Steven Wu, Samson Zhou

    Abstract: We study how to release summary statistics on a data stream subject to the constraint of differential privacy. In particular, we focus on releasing the family of symmetric norms, which are invariant under sign-flips and coordinate-wise permutations on an input data stream and include $L_p$ norms, $k$-support norms, top-$k$ norms, and the box norm as special cases. Although it may be possible to de… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

  6. arXiv:2306.09396  [pdf, other

    cs.DS cs.LG

    Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

    Authors: **gfeng Wu, Wennan Zhu, Peter Kairouz, Vladimir Braverman

    Abstract: In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketc… ▽ More

    Submitted 2 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 camera ready version

  7. arXiv:2306.05310  [pdf, other

    cs.LG

    A framework for dynamically training and adapting deep reinforcement learning models to different, low-compute, and continuously changing radiology deployment environments

    Authors: Guangyao Zheng, Shuhao Lai, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: While Deep Reinforcement Learning has been widely researched in medical imaging, the training and deployment of these models usually require powerful GPUs. Since imaging environments evolve rapidly and can be generated by edge devices, the algorithm is required to continually learn and adapt to changing environments, and adjust to low-compute devices. To this end, we developed three image coreset… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  8. arXiv:2306.00188  [pdf, other

    cs.LG cs.CV eess.IV

    Multi-environment lifelong deep reinforcement learning for medical imaging

    Authors: Guangyao Zheng, Shuhao Lai, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: Deep reinforcement learning(DRL) is increasingly being explored in medical imaging. However, the environments for medical imaging tasks are constantly evolving in terms of imaging orientations, imaging sequences, and pathologies. To that end, we developed a Lifelong DRL framework, SERIL to continually learn new tasks in changing imaging environments without catastrophic forgetting. SERIL was devel… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  9. arXiv:2305.11980  [pdf, other

    cs.LG

    AutoCoreset: An Automatic Practical Coreset Construction Framework

    Authors: Alaa Maalouf, Murad Tukan, Vladimir Braverman, Daniela Rus

    Abstract: A coreset is a tiny weighted subset of an input set, that closely resembles the loss function, with respect to a certain set of queries. Coresets became prevalent in machine learning as they have shown to be advantageous for many applications. While coreset research is an active research area, unfortunately, coresets are constructed in a problem-dependent manner, where for each problem, a new core… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  10. arXiv:2305.11788  [pdf, other

    cs.LG stat.ML

    Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

    Authors: **gfeng Wu, Vladimir Braverman, Jason D. Lee

    Abstract: Recent research has observed that in machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS) [Cohen, et al., 2021], where the stepsizes are set to be large, resulting in non-monotonic losses induced by the GD iterates. This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS… ▽ More

    Submitted 15 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 camera ready version

  11. arXiv:2303.16287  [pdf, ps, other

    cs.DS

    Lower Bounds for Pseudo-Deterministic Counting in a Stream

    Authors: Vladimir Braverman, Robert Krauthgamer, Aditya Krishnan, Shay Sapir

    Abstract: Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary -- for many streaming problems, both relaxations must be employed simultaneously, to avoid an exponentially larger (and often trivial) space complexity. A common drawback of these randomized approximate algorithms is that independent e… ▽ More

    Submitted 15 May, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: 14 pages, ICALP2023

  12. arXiv:2303.10263  [pdf, other

    cs.LG

    Fixed Design Analysis of Regularization-Based Continual Learning

    Authors: Haoran Li, **gfeng Wu, Vladimir Braverman

    Abstract: We consider a continual learning (CL) problem with two linear regression tasks in the fixed design setting, where the feature vectors are assumed fixed and the labels are assumed to be random variables. We consider an $\ell_2$-regularized CL algorithm, which computes an Ordinary Least Squares parameter to fit the first dataset, then computes another parameter that fits the second dataset under an… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: CoLLAs 2023 camera-ready version

  13. arXiv:2303.06783  [pdf, other

    cs.LG cs.CV eess.IV

    Asynchronous Decentralized Federated Lifelong Learning for Landmark Localization in Medical Imaging

    Authors: Guangyao Zheng, Michael A. Jacobs, Vladimir Braverman, Vishwa S. Parekh

    Abstract: Federated learning is a recent development in the machine learning area that allows a system of devices to train on one or more tasks without sharing their data to a single location or device. However, this framework still requires a centralized global model to consolidate individual models into one, and the devices train synchronously, which both can be potential bottlenecks for using federated l… ▽ More

    Submitted 10 January, 2024; v1 submitted 12 March, 2023; originally announced March 2023.

  14. arXiv:2303.05151  [pdf, other

    cs.LG cs.AI

    Provable Data Subset Selection For Efficient Neural Network Training

    Authors: Murad Tukan, Samson Zhou, Alaa Maalouf, Daniela Rus, Vladimir Braverman, Dan Feldman

    Abstract: Radial basis function neural networks (\emph{RBFNN}) are {well-known} for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. In this paper, we introduce the first algorithm to construct coresets for \emph{RBFNNs}, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function n… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  15. arXiv:2303.02255  [pdf, other

    cs.LG math.OC stat.ML

    Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

    Authors: **gfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspec… ▽ More

    Submitted 26 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: ICML 2023 camera ready

  16. arXiv:2302.11510  [pdf, other

    cs.LG cs.CV

    Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging

    Authors: Guangyao Zheng, Samson Zhou, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Furthermore, selective experience replay based techniques are model agnostic and allow experiences to be shared across different models. However, storing experienc… ▽ More

    Submitted 9 January, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

  17. arXiv:2209.12054  [pdf, other

    stat.ML cs.LG

    From Local to Global: Spectral-Inspired Graph Neural Networks

    Authors: Ningyuan Huang, Soledad Villar, Carey E. Priebe, Da Zheng, Chengyue Huang, Lin Yang, Vladimir Braverman

    Abstract: Graph Neural Networks (GNNs) are powerful deep learning methods for Non-Euclidean data. Popular GNNs are message-passing algorithms (MPNNs) that aggregate and combine signals in a local graph neighborhood. However, shallow MPNNs tend to miss long-range signals and perform poorly on some heterophilous graphs, while deep MPNNs can suffer from issues like over-smoothing or over-squashing. To mitigate… ▽ More

    Submitted 4 November, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

    Comments: Accepted for publication at the NeurIPS 2022 GLFrontiers Workshop

  18. arXiv:2209.01901  [pdf, ps, other

    cs.DS

    The Power of Uniform Sampling for Coresets

    Authors: Vladimir Braverman, Vincent Cohen-Addad, Shaofeng H. -C. Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, Xuan Wu

    Abstract: Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive er… ▽ More

    Submitted 17 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

  19. arXiv:2208.01857  [pdf, other

    cs.LG math.OC stat.ML

    The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

    Authors: **gfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 32 pages, 1 figure, 1 table

  20. arXiv:2206.02291  [pdf, other

    cs.CL

    Pretrained Models for Multilingual Federated Learning

    Authors: Orion Weller, Marc Marone, Vladimir Braverman, Dawn Lawrie, Benjamin Van Durme

    Abstract: Since the advent of Federated Learning (FL), research has applied these methods to natural language processing (NLP) tasks. Despite a plethora of papers in FL for NLP, no previous works have studied how multilingual text impacts FL algorithms. Furthermore, multilingual text provides an interesting avenue to examine the impact of non-IID text (e.g. different languages) on FL in naturally occurring… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: NAACL 2022

  21. arXiv:2204.09136  [pdf, other

    cs.DS

    The White-Box Adversarial Data Stream Model

    Authors: Miklos Ajtai, Vladimir Braverman, T. S. Jayram, Sandeep Silwal, Alec Sun, David P. Woodruff, Samson Zhou

    Abstract: We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the $L_1$-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long stre… ▽ More

    Submitted 23 July, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: PODS 2022

  22. arXiv:2203.06514  [pdf, other

    cs.LG cs.AI cs.CV

    Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

    Authors: Ali Abbasi, Parsa Nooralinejad, Vladimir Braverman, Hamed Pirsiavash, Soheil Kolouri

    Abstract: Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called "catastrophic forgetting" and is deeply rooted in the stability-plasticity dilemma. Overcoming… ▽ More

    Submitted 8 July, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

  23. arXiv:2203.04370  [pdf, other

    cs.LG

    New Coresets for Projective Clustering and Applications

    Authors: Murad Tukan, Xuan Wu, Samson Zhou, Vladimir Braverman, Dan Feldman

    Abstract: $(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure. In this paper, we propose the first algorithm that returns an $L_\infty… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

  24. arXiv:2203.03159  [pdf, other

    cs.LG math.OC stat.ML

    Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

    Authors: Difan Zou, **gfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 28 pages, 2 figures

  25. arXiv:2112.10001  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Cross-Domain Federated Learning in Medical Imaging

    Authors: Vishwa S Parekh, Shuhao Lai, Vladimir Braverman, Jeff Leal, Steven Rowe, Jay J Pillai, Michael A Jacobs

    Abstract: Federated learning is increasingly being explored in the field of medical imaging to train deep learning models on large scale datasets distributed across different data centers while preserving privacy by avoiding the need to transfer sensitive patient information. In this manuscript, we explore federated learning in a multi-domain, multi-task setting wherein different participating nodes may con… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

    Comments: Under Review for MIDL 2022

  26. arXiv:2110.06198  [pdf, other

    cs.LG math.OC stat.ML

    Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

    Authors: **gfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) has been shown to generalize well in many deep learning applications. In practice, one often runs SGD with a geometrically decaying stepsize, i.e., a constant initial stepsize followed by multiple geometric stepsize decay, and uses the last iterate as the output. This kind of SGD is known to be nearly minimax optimal for classical finite-dimensional linear regress… ▽ More

    Submitted 11 July, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: 35 pages, 2 figures, 1 table. In ICML 2022

  27. arXiv:2109.01635  [pdf, other

    cs.DS

    Symmetric Norm Estimation and Regression on Sliding Windows

    Authors: Vladimir Braverman, Viska Wei, Samson Zhou

    Abstract: The sliding window model generalizes the standard streaming model and often performs better in applications where recent data is more important or more accurate than data that arrived prior to a certain time. We study the problem of approximating symmetric norms (a norm on $\mathbb{R}^n$ that is invariant under sign-flips and coordinate-wise permutations) in the sliding window model, where only th… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: COCOON 2021

  28. arXiv:2108.05439  [pdf, other

    cs.LG

    Gap-Dependent Unsupervised Exploration for Reinforcement Learning

    Authors: **gfeng Wu, Vladimir Braverman, Lin F. Yang

    Abstract: For the problem of task-agnostic reinforcement learning (RL), an agent first collects samples from an unknown environment without the supervision of reward signals, then is revealed with a reward and is asked to compute a corresponding near-optimal policy. Existing approaches mainly concern the worst-case scenarios, in which no structural information of the reward/transition-dynamics is utilized.… ▽ More

    Submitted 14 March, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: AISTATS 2022 camera ready version

  29. arXiv:2108.04552  [pdf, other

    cs.LG math.OC stat.ML

    The Benefits of Implicit Regularization from SGD in Least Squares Problems

    Authors: Difan Zou, **gfeng Wu, Vladimir Braverman, Quanquan Gu, Dean P. Foster, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make s… ▽ More

    Submitted 10 July, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: 33 pages, 1 figure. In NeurIPS 2021

  30. arXiv:2106.16112  [pdf, other

    cs.DS

    Coresets for Clustering with Missing Values

    Authors: Vladimir Braverman, Shaofeng H. -C. Jiang, Robert Krauthgamer, Xuan Wu

    Abstract: We provide the first coreset for clustering points in $\mathbb{R}^d$ that have multiple missing values (coordinates). Previous coreset constructions only allow one missing coordinate. The challenge in this setting is that objective functions, like $k$-Means, are evaluated only on the set of available (non-missing) coordinates, which varies across points. Recall that an $ε$-coreset of a large datas… ▽ More

    Submitted 11 November, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

  31. arXiv:2106.14952  [pdf, other

    cs.LG cs.DS

    Adversarial Robustness of Streaming Algorithms through Importance Sampling

    Authors: Vladimir Braverman, Avinatan Hassidim, Yossi Matias, Mariano Schain, Sandeep Silwal, Samson Zhou

    Abstract: In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such as regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction. For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model. Our results are bas… ▽ More

    Submitted 25 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  32. arXiv:2104.08604  [pdf, other

    cs.LG

    Lifelong Learning with Sketched Structural Regularization

    Authors: Haoran Li, Aditya Krishnan, **gfeng Wu, Soheil Kolouri, Praveen K. Pilly, Vladimir Braverman

    Abstract: Preventing catastrophic forgetting while continually learning new tasks is an essential problem in lifelong learning. Structural regularization (SR) refers to a family of algorithms that mitigate catastrophic forgetting by penalizing the network for changing its "critical parameters" from previous tasks while learning a new one. The penalty is often induced via a quadratic regularizer defined by a… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

  33. Sublinear Time Spectral Density Estimation

    Authors: Vladimir Braverman, Aditya Krishnan, Christopher Musco

    Abstract: We present a new sublinear time algorithm for approximating the spectral density (eigenvalue distribution) of an $n\times n$ normalized graph adjacency or Laplacian matrix. The algorithm recovers the spectrum up to $ε$ accuracy in the Wasserstein-1 distance in $O(n\cdot \text{poly}(1/ε))$ time given sample access to the graph. This result compliments recent work by David Cohen-Steiner, Weihao Kong… ▽ More

    Submitted 14 April, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted to STOC'22

  34. arXiv:2103.12692  [pdf, other

    cs.LG math.OC stat.ML

    Benign Overfitting of Constant-Stepsize SGD for Linear Regression

    Authors: Difan Zou, **gfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: There is an increasing realization that algorithmic inductive biases are central in preventing overfitting; empirically, we often see a benign overfitting phenomenon in overparameterized settings for natural learning algorithms, such as stochastic gradient descent (SGD), where little to no explicit regularization has been employed. This work considers this issue in arguably the most basic setting:… ▽ More

    Submitted 12 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: 56 pages, 2 figures. A short version is accepted at the 34th Annual Conference on Learning Theory (COLT 2021)

  35. arXiv:2011.13034  [pdf, other

    cs.LG stat.ML

    Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

    Authors: **gfeng Wu, Vladimir Braverman, Lin F. Yang

    Abstract: In this paper we consider multi-objective reinforcement learning where the objectives are balanced using preferences. In practice, the preferences are often given in an adversarial manner, e.g., customers can be picky in many applications. We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product… ▽ More

    Submitted 27 October, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2021 Camera Ready Version

  36. arXiv:2011.06103  [pdf, other

    cs.DC astro-ph.SR cs.LG

    Sketch and Scale: Geo-distributed tSNE and UMAP

    Authors: Viska Wei, Nikita Ivkin, Vladimir Braverman, Alexander Szalay

    Abstract: Running machine learning analytics over geographically distributed datasets is a rapidly arising problem in the world of data management policies ensuring privacy and data security. Visualizing high dimensional data using tools such as t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP) became common practice for data scientists. Both tools s… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: IEEE BigData2020 conference

  37. arXiv:2011.02538  [pdf, other

    cs.LG stat.ML

    Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate

    Authors: **gfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu

    Abstract: Understanding the algorithmic bias of \emph{stochastic gradient descent} (SGD) is one of the key challenges in modern machine learning and deep learning theory. Most of the existing works, however, focus on \emph{very small or even infinitesimal} learning rate regime, and fail to cover practical scenarios where the learning rate is \emph{moderate and annealing}. In this paper, we make an initial a… ▽ More

    Submitted 29 March, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: ICLR 2021 Camera Ready

  38. arXiv:2011.01777  [pdf, ps, other

    cs.DS

    Near-Optimal Entrywise Sampling of Numerically Sparse Matrices

    Authors: Vladimir Braverman, Robert Krauthgamer, Aditya Krishnan, Shay Sapir

    Abstract: Many real-world data sets are sparse or almost sparse. One method to measure this for a matrix $A\in \mathbb{R}^{n\times n}$ is the \emph{numerical sparsity}, denoted $\mathsf{ns}(A)$, defined as the minimum $k\geq 1$ such that $\|a\|_1/\|a\|_2 \leq \sqrt{k}$ for every row and every column $a$ of $A$. This measure of $a$ is smooth and is clearly only smaller than the number of non-zeros in the row… ▽ More

    Submitted 5 July, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: 20 pages. To appear in COLT 2021

  39. arXiv:2008.08316  [pdf, other

    cs.LG cs.AI stat.ML

    Data-Independent Structured Pruning of Neural Networks via Coresets

    Authors: Ben Mussay, Daniel Feldman, Samson Zhou, Vladimir Braverman, Margarita Osadchy

    Abstract: Model compression is crucial for deployment of neural networks on devices with limited computational and memory resources. Many different methods show comparable accuracy of the compressed model and similar compression rates. However, the majority of the compression methods are based on heuristics and offer no worst-case guarantees on the trade-off between the compression rate and the approximatio… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

  40. arXiv:2008.06736  [pdf, other

    cs.LG stat.ML

    Obtaining Adjustable Regularization for Free via Iterate Averaging

    Authors: **gfeng Wu, Vladimir Braverman, Lin F. Yang

    Abstract: Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regress… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

    Comments: ICML 2020 camera ready

  41. arXiv:2007.07682  [pdf, other

    cs.LG stat.ML

    FetchSGD: Communication-Efficient Federated Learning with Sketching

    Authors: Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora

    Abstract: Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm, called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A k… ▽ More

    Submitted 7 October, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

  42. arXiv:2004.07718  [pdf, ps, other

    cs.DS

    Coresets for Clustering in Excluded-minor Graphs and Beyond

    Authors: Vladimir Braverman, Shaofeng H. -C. Jiang, Robert Krauthgamer, Xuan Wu

    Abstract: Coresets are modern data-reduction tools that are widely used in data analysis to improve efficiency in terms of running time, space and communication complexity. Our main result is a fast algorithm to construct a small coreset for k-Median in (the shortest-path metric of) an excluded-minor graph. Specifically, we give the first coreset of size that depends only on $k$, $ε$ and the excluded-minor… ▽ More

    Submitted 15 July, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

  43. arXiv:2002.06296  [pdf, other

    cs.DS

    Sparse Coresets for SVD on Infinite Streams

    Authors: Vladimir Braverman, Dan Feldman, Harry Lang, Daniela Rus, Adiel Statman

    Abstract: In streaming Singular Value Decomposition (SVD), $d$-dimensional rows of a possibly infinite matrix arrive sequentially as points in $\mathbb{R}^d$. An $ε$-coreset is a (much smaller) matrix whose sum of square distances of the rows to any hyperplane approximates that of the original matrix to a $1 \pm ε$ factor. Our main result is that we can maintain a $ε$-coreset while storing only… ▽ More

    Submitted 26 November, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

  44. arXiv:1912.11432  [pdf, ps, other

    astro-ph.GA

    Six Dimensional Streaming Algorithm for Cluster Finding in N-Body Simulations

    Authors: Aidan Reilly, Nikita Ivkin, Gerard Lemson, Vladimir Braverman, Alexander Szalay

    Abstract: Cosmological N-body simulations are crucial for understanding how the Universe evolves. Studying large-scale distributions of matter in these simulations and comparing them to observations usually involves detecting dense clusters of particles called "halos,'' which are gravitationally bound and expected to form galaxies. However, traditional cluster finders are computationally expensive and use m… ▽ More

    Submitted 24 December, 2019; originally announced December 2019.

    Comments: 4 pages, 2 figures, to be published in Astronomical Data Analysis Software and Systems XXVIX. ASP Conference Series, proceedings of a conference held (6-10 October 2019) at The Martini Plaza, Groningen, The Netherlands

  45. arXiv:1911.06951  [pdf, other

    cs.DS cs.NI

    Memory-Efficient Performance Monitoring on Programmable Switches with Lean Algorithms

    Authors: Zaoxing Liu, Samson Zhou, Ori Rottenstreich, Vladimir Braverman, Jennifer Rexford

    Abstract: Network performance problems are notoriously difficult to diagnose. Prior profiling systems collect performance statistics by kee** information about each network flow, but maintaining per-flow state is not scalable on resource-constrained NIC and switch hardware. Instead, we propose sketch-based performance monitoring using memory that is sublinear in the number of flows. Existing sketches esti… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: To appear at APoCS 2020

  46. arXiv:1908.00175  [pdf

    eess.IV cs.LG physics.med-ph

    Multiparametric Deep Learning Tissue Signatures for Muscular Dystrophy: Preliminary Results

    Authors: Alex E. Bocchieri, Vishwa S. Parekh, Kathryn R. Wagner. Shivani Ahlawat, Vladimir Braverman, Doris G. Leung, Michael A. Jacobs

    Abstract: A current clinical challenge is identifying limb girdle muscular dystrophy 2I(LGMD2I)tissue changes in the thighs, in particular, separating fat, fat-infiltrated muscle, and muscle tissue. Deep learning algorithms have the ability to learn different features by using the inherent tissue contrasts from multiparametric magnetic resonance imaging (mpMRI). To that end, we developed a novel multiparame… ▽ More

    Submitted 31 July, 2019; originally announced August 2019.

    Comments: 6 pages, 3 figures. MIDL 2019 [arXiv:1907.08612]

    Report number: MIDL/2019/ExtendedAbstract/H1g3ICh4cV

  47. arXiv:1907.07574  [pdf, ps, other

    cs.DS

    Improved Algorithms for Time Decay Streams

    Authors: Vladimir Braverman, Harry Lang, Enayat Ullah, Samson Zhou

    Abstract: In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a \emph{coreset}, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

    Comments: To appear at APPROX 2019

  48. arXiv:1907.05457  [pdf, ps, other

    cs.DS

    Schatten Norms in Matrix Streams: Hello Sparsity, Goodbye Dimension

    Authors: Vladimir Braverman, Robert Krauthgamer, Aditya Krishnan, Roi Sinoff

    Abstract: Spectral functions of large matrices contains important structural information about the underlying data, and is thus becoming increasingly important. Many times, large matrices representing real-world data are \emph{sparse} or \emph{doubly sparse} (i.e., sparse in both rows and columns), and are accessed as a \emph{stream} of updates, typically organized in \emph{row-order}. In this setting, wher… ▽ More

    Submitted 27 February, 2020; v1 submitted 11 July, 2019; originally announced July 2019.

    Comments: 39 pages

  49. arXiv:1907.04733  [pdf, other

    cs.DS

    Coresets for Clustering in Graphs of Bounded Treewidth

    Authors: Daniel Baker, Vladimir Braverman, Lingxiao Huang, Shaofeng H. -C. Jiang, Robert Krauthgamer, Xuan Wu

    Abstract: We initiate the study of coresets for clustering in graph metrics, i.e., the shortest-path metric of edge-weighted graphs. Such clustering problems are essential to data analysis and used for example in road networks and data visualization. A coreset is a compact summary of the data that approximately preserves the clustering objective for every possible center set, and it offers significant effic… ▽ More

    Submitted 12 December, 2022; v1 submitted 10 July, 2019; originally announced July 2019.

  50. arXiv:1907.04018  [pdf, other

    cs.LG stat.ML

    Data-Independent Neural Pruning via Coresets

    Authors: Ben Mussay, Margarita Osadchy, Vladimir Braverman, Samson Zhou, Dan Feldman

    Abstract: Previous work showed empirically that large neural networks can be significantly reduced in size while preserving their accuracy. Model compression became a central research topic, as it is crucial for deployment of neural networks on devices with limited computational and memory resources. The majority of the compression methods are based on heuristics and offer no worst-case guarantees on the tr… ▽ More

    Submitted 3 January, 2020; v1 submitted 9 July, 2019; originally announced July 2019.