Skip to main content

Showing 1–5 of 5 results for author: Hakimi, I

.
  1. arXiv:2304.14318  [pdf, other

    cs.CL

    q2d: Turning Questions into Dialogs to Teach Models How to Search

    Authors: Yonatan Bitton, Shlomi Cohen-Ganor, Ido Hakimi, Yoad Lewenberg, Roee Aharoni, Enav Weinreb

    Abstract: One of the exciting capabilities of recent language models for dialog is their ability to independently search for relevant information to ground a given dialog response. However, obtaining training data to teach models how to issue search queries is time and resource consuming. In this work, we propose q2d: an automatic data generation pipeline that generates information-seeking dialogs from ques… ▽ More

    Submitted 26 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Accepted to EMNLP 2023. Website: https://question2dialog.github.io/

  2. arXiv:2106.12261  [pdf, other

    cs.LG

    Learning Under Delayed Feedback: Implicitly Adapting to Gradient Delays

    Authors: Rotem Zamir Aviv, Ido Hakimi, Assaf Schuster, Kfir Y. Levy

    Abstract: We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory. We propose a robust training method for the constrained setting and derive non asymptotic convergence guarantees that do not depend on prior knowledge of update delays, objective smoothness, and gradient variance. Conversely, existing methods for this setting cr… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Comments: to be published in ICML 2021

  3. arXiv:1909.10802  [pdf, other

    cs.LG stat.ML

    Gap Aware Mitigation of Gradient Staleness

    Authors: Saar Barkai, Ido Hakimi, Assaf Schuster

    Abstract: Cloud computing is becoming increasingly popular as a platform for distributed training of deep neural networks. Synchronous stochastic gradient descent (SSGD) suffers from substantial slowdowns due to stragglers if the environment is non-dedicated, as is common in cloud computing. Asynchronous SGD (ASGD) methods are immune to these slowdowns but are scarcely used due to gradient staleness, which… ▽ More

    Submitted 3 February, 2020; v1 submitted 24 September, 2019; originally announced September 2019.

    Comments: Published as a conference paper at ICLR 2020

  4. arXiv:1907.11612  [pdf, other

    cs.LG cs.DC stat.ML

    Taming Momentum in a Distributed Asynchronous Environment

    Authors: Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster

    Abstract: Although distributed computing can significantly reduce the training time of deep neural networks, scaling the training process while maintaining high efficiency and final accuracy is challenging. Distributed asynchronous training enjoys near-linear speedup, but asynchrony causes gradient staleness - the main difficulty in scaling stochastic gradient descent to large clusters. Momentum, which is o… ▽ More

    Submitted 14 October, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

  5. arXiv:1805.08079  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Faster Neural Network Training with Approximate Tensor Operations

    Authors: Menachem Adelman, Kfir Y. Levy, Ido Hakimi, Mark Silberstein

    Abstract: We propose a novel technique for faster deep neural network training which systematically applies sample-based approximation to the constituent tensor operations, i.e., matrix multiplications and convolutions. We introduce new sampling techniques, study their theoretical properties, and prove that they provide the same convergence guarantees when applied to SGD training. We apply approximate tenso… ▽ More

    Submitted 25 October, 2021; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: NeurIPS 2021 camera ready