Skip to main content

Showing 1–4 of 4 results for author: Taheri, H

Searching in archive math. Search in all archives.
.
  1. arXiv:2310.12680  [pdf, other

    cs.LG math.OC stat.ML

    On the Optimization and Generalization of Multi-head Attention

    Authors: Puneesh Deora, Rouzbeh Ghaderi, Hossein Taheri, Christos Thrampoulidis

    Abstract: The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attent… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 48 page; presented in the Workshop on High-dimensional Learning Dynamics, ICML 2023

  2. arXiv:2002.07284  [pdf, other

    math.ST cs.IT eess.SP stat.ML

    Sharp Asymptotics and Optimal Performance for Inference in Binary Models

    Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

    Abstract: We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance amon… ▽ More

    Submitted 26 February, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

  3. arXiv:1908.04433  [pdf, other

    math.ST cs.IT cs.LG eess.SP

    Sharp Guarantees for Solving Random Equations with One-Bit Information

    Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

    Abstract: We study the performance of a wide class of convex optimization-based estimators for recovering a signal from corrupted one-bit measurements in high-dimensions. Our general result predicts sharply the performance of such estimators in the linear asymptotic regime when the measurement vectors have entries IID Gaussian. This includes, as a special case, the previously studied least-squares estimator… ▽ More

    Submitted 23 January, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

  4. arXiv:1907.10595  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Robust and Communication-Efficient Collaborative Learning

    Authors: Amirhossein Reisizadeh, Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

    Abstract: We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively. It is well-known that decentralized optimization schemes face two major system bottlenecks: stragglers' delay and communication overhead. In this paper, we tackle these bottlenecks by proposing a novel decentralized and gradient-based optimization algorithm… ▽ More

    Submitted 31 October, 2019; v1 submitted 24 July, 2019; originally announced July 2019.