Skip to main content

Showing 1–6 of 6 results for author: Kalan, S M M

.
  1. arXiv:2006.10581  [pdf, other

    cs.LG cs.IT stat.ML

    Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks

    Authors: Seyed Mohammadreza Mousavi Kalan, Zalan Fabian, A. Salman Avestimehr, Mahdi Soltanolkotabi

    Abstract: Transfer learning has emerged as a powerful technique for improving the performance of machine learning models on new domains where labeled training data may be scarce. In this approach a model trained for a source task, where plenty of labeled training data is available, is used as a starting point for training a model on a related target task with only few labeled training data. Despite recent e… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

  2. arXiv:1901.06587  [pdf, other

    cs.LG cs.DC cs.IT stat.ML

    Fitting ReLUs via SGD and Quantized SGD

    Authors: Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi, A. Salman Avestimehr

    Abstract: In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form $\mathbf{x}\rightarrow \max(0,\langle\mathbf{w},\mathbf{x}\rangle)$ with $\mathbf{w}\in\mathbb{R}^d$ denoting the weight vector. We focus on a planted model where the inputs are chosen i.i.d. from a Gaussian d… ▽ More

    Submitted 1 April, 2019; v1 submitted 19 January, 2019; originally announced January 2019.

  3. arXiv:1806.00939  [pdf, other

    cs.IT cs.DC cs.LG

    Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

    Authors: Qian Yu, Songze Li, Netanel Raviv, Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi, Salman Avestimehr

    Abstract: We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify… ▽ More

    Submitted 1 April, 2019; v1 submitted 3 June, 2018; originally announced June 2018.

  4. arXiv:1805.09934  [pdf, other

    cs.IT cs.DC cs.LG

    Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding

    Authors: Songze Li, Seyed Mohammadreza Mousavi Kalan, Qian Yu, Mahdi Soltanolkotabi, A. Salman Avestimehr

    Abstract: We consider the problem of training a least-squares regression model on a large dataset using gradient descent. The computation is carried out on a distributed system consisting of a master node and multiple worker nodes. Such distributed systems are significantly slowed down due to the presence of slow-running machines (stragglers) as well as various communication bottlenecks. We propose "polynom… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  5. arXiv:1804.00217  [pdf, ps, other

    cs.IT cs.LG stat.ML

    Fundamental Resource Trade-offs for Encoded Distributed Optimization

    Authors: A. Salman Avestimehr, Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi

    Abstract: Dealing with the shear size and complexity of today's massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a.~stragglers can significantly slow down computation as the slowest node may dictate… ▽ More

    Submitted 1 April, 2019; v1 submitted 31 March, 2018; originally announced April 2018.

  6. arXiv:1710.09990  [pdf, other

    cs.IT cs.DC

    Near-Optimal Straggler Mitigation for Distributed Gradient Methods

    Authors: Songze Li, Seyed Mohammadreza Mousavi Kalan, A. Salman Avestimehr, Mahdi Soltanolkotabi

    Abstract: Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes compute partial gradients based on their partial and local data sets, and send the results to a master node where all the computations are aggregated into a full… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.