Skip to main content

Showing 1–50 of 67 results for author: Shalev-Shwartz, S

.
  1. arXiv:2405.04710  [pdf, other

    cs.LG math.OC

    Untangling Lariats: Subgradient Following of Variationally Penalized Objectives

    Authors: Kai-Chia Mo, Shai Shalev-Shwartz, Nisæl Shártov

    Abstract: We describe a novel subgradient following apparatus for calculating the optimum of convex problems with variational penalties. In this setting, we receive a sequence $y_i,\ldots,y_n$ and seek a smooth sequence $x_1,\ldots,x_n$. The smooth sequence attains the minimum Bregman divergence to an input sequence with additive variational penalties in the general form of $\sum_i g_i(x_{i+1}-x_i)$. We der… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  2. arXiv:2403.19887  [pdf, other

    cs.CL cs.LG

    Jamba: A Hybrid Transformer-Mamba Language Model

    Authors: Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, Yoav Shoham

    Abstract: We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while kee** active parameter usage manageable. This flexible architecture allows reso… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Webpage: https://www.ai21.com/jamba

  3. arXiv:2310.17688  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Managing extreme AI risks amid rapid progress

    Authors: Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

    Abstract: Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to develo** generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although rese… ▽ More

    Submitted 22 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Published in Science: https://www.science.org/doi/10.1126/science.adn0117

  4. arXiv:2302.06354  [pdf, other

    cs.LG cs.AI

    Less is More: Selective Layer Finetuning with SubTuning

    Authors: Gal Kaplun, Andrey Gurevich, Tal Swisa, Mazor David, Shai Shalev-Shwartz, Eran Malach

    Abstract: Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance. In this work, we study an alternative finetuning method, where instead of finetuning all the weights of the network, we only train a carefully chosen subset of layers, kee** the rest of the weights frozen at their initial (pretrained) v… ▽ More

    Submitted 2 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  5. arXiv:2205.00445  [pdf, other

    cs.CL cs.AI

    MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

    Authors: Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber, Nir Ratner, Yoav Shoham, Hofit Bata, Yoav Levine, Kevin Leyton-Brown, Dor Muhlgay, Noam Rozen, Erez Schwartz, Gal Shachaf, Shai Shalev-Shwartz, Amnon Shashua, Moshe Tenenholtz

    Abstract: Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks. Although an essential element of modern AI, LMs are also inherently limited in a number of ways. We discuss these limitations and how they can be avoided by adopting a systems approach. Conceptualizing the challenge as one that involves knowledge and reasoning in addition to… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

  6. arXiv:2204.10019  [pdf, other

    cs.CL cs.AI

    Standing on the Shoulders of Giant Frozen Language Models

    Authors: Yoav Levine, Itay Dalmedigos, Ori Ram, Yoel Zeldes, Daniel Jannai, Dor Muhlgay, Yoni Osin, Opher Lieber, Barak Lenz, Shai Shalev-Shwartz, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham

    Abstract: Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-t… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  7. arXiv:2203.14649  [pdf, other

    cs.LG cs.AI stat.ML

    Knowledge Distillation: Bad Models Can Be Good Role Models

    Authors: Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz

    Abstract: Large neural networks trained in the overparameterized regime are able to fit noise to zero train error. Recent work \citep{nakkiran2020distributional} has empirically observed that such networks behave as "conditional samplers" from the noisy distribution. That is, they replicate the noise in the train data to unseen examples. We give a theoretical framework for studying this conditional sampling… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  8. arXiv:2102.00434  [pdf, ps, other

    cs.LG cs.NE stat.ML

    The Connection Between Approximation, Depth Separation and Learnability in Neural Networks

    Authors: Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

    Abstract: Several recent works have shown separation results between deep neural networks, and hypothesis classes with inferior approximation capacity such as shallow networks or kernel classes. On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks. In this work we study the intricat… ▽ More

    Submitted 18 July, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

    Comments: COLT 2021 camera ready version

  9. arXiv:2010.01369  [pdf, other

    cs.LG stat.ML

    Computational Separation Between Convolutional and Fully-Connected Networks

    Authors: Eran Malach, Shai Shalev-Shwartz

    Abstract: Convolutional neural networks (CNN) exhibit unmatched performance in a multitude of computer vision tasks. However, the advantage of using convolutional networks over fully-connected networks is not understood from a theoretical perspective. In this work, we show how convolutional networks can leverage locality in the data, and thus achieve a computational advantage over fully-connected networks.… ▽ More

    Submitted 3 October, 2020; originally announced October 2020.

  10. arXiv:2008.08059  [pdf, ps, other

    cs.LG stat.ML

    When Hardness of Approximation Meets Hardness of Learning

    Authors: Eran Malach, Shai Shalev-Shwartz

    Abstract: A supervised learning algorithm has access to a distribution of labeled examples, and needs to return a function (hypothesis) that correctly labels the examples. The hypothesis of the learner is taken from some fixed class of functions (e.g., linear classifiers, neural networks etc.). A failure of the learning algorithm can occur due to two possible reasons: wrong choice of hypothesis class (hardn… ▽ More

    Submitted 23 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

  11. arXiv:2004.04644  [pdf, ps, other

    cs.LG

    On the Ethics of Building AI in a Responsible Manner

    Authors: Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

    Abstract: The AI-alignment problem arises when there is a discrepancy between the goals that a human designer specifies to an AI learner and a potential catastrophic outcome that does not reflect what the human designer really wants. We argue that a formalism of AI alignment that does not distinguish between strategic and agnostic misalignments is not useful, as it deems all technology as un-safe. We propos… ▽ More

    Submitted 30 March, 2020; originally announced April 2020.

  12. arXiv:2002.00585  [pdf, ps, other

    cs.LG stat.ML

    Proving the Lottery Ticket Hypothesis: Pruning is All You Need

    Authors: Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

    Abstract: The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network. We prove an even stronger hypothesis (as was also conjectured in Ramanujan et al., 2019), showing that for every bounded distribution and every target network with bounded wei… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

  13. arXiv:1910.11923  [pdf, other

    cs.LG stat.ML

    Learning Boolean Circuits with Neural Networks

    Authors: Eran Malach, Shai Shalev-Shwartz

    Abstract: While on some natural distributions, neural-networks are trained efficiently using gradient-based algorithms, it is known that learning them is computationally hard in the worst-case. To separate hard from easy to learn distributions, we observe the property of local correlation: correlation between local patterns of the input and the target label. We focus on learning deep neural-networks using a… ▽ More

    Submitted 18 January, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

  14. arXiv:1909.12051  [pdf, other

    cs.LG stat.ML

    The Implicit Bias of Depth: How Incremental Learning Drives Generalization

    Authors: Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely

    Abstract: A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity. We formally define the notion of incremental learning dynamics and derive the conditions on depth and initialization for which this phenomenon arises in deep linear… ▽ More

    Submitted 28 December, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: 25 pages, 7 figures, published at the International Conference on Learning Representations (ICLR) 2020

  15. arXiv:1908.05646  [pdf, other

    cs.CL cs.LG

    SenseBERT: Driving Some Sense into BERT

    Authors: Yoav Levine, Barak Lenz, Or Dagan, Ori Ram, Dan Padnos, Or Sharir, Shai Shalev-Shwartz, Amnon Shashua, Yoav Shoham

    Abstract: The ability to learn from large unlabeled corpora has allowed neural language models to advance the frontier in natural language understanding. However, existing self-supervision techniques operate at the word form level, which serves as a surrogate for the underlying semantic content. This paper proposes a method to employ weak-supervision directly at the word sense level. Our model, named SenseB… ▽ More

    Submitted 18 May, 2020; v1 submitted 15 August, 2019; originally announced August 2019.

    Comments: Accepted to ACL 2020

  16. arXiv:1907.06347  [pdf, other

    cs.LG stat.ML

    Discriminative Active Learning

    Authors: Daniel Gissin, Shai Shalev-Shwartz

    Abstract: We propose a new batch mode active learning algorithm designed for neural networks and large query batch sizes. The method, Discriminative Active Learning (DAL), poses active learning as a binary classification task, attempting to choose examples to label in such a way as to make the labeled set and the unlabeled pool indistinguishable. Experimenting on image classification tasks, we empirically s… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Comments: 11 pages, 3 figures

  17. arXiv:1906.05032  [pdf, ps, other

    cs.LG stat.ML

    Decoupling Gating from Linearity

    Authors: Jonathan Fiat, Eran Malach, Shai Shalev-Shwartz

    Abstract: ReLU neural-networks have been in the focus of many recent theoretical works, trying to explain their empirical success. Nonetheless, there is still a gap between current theoretical results and empirical observations, even in the case of shallow (one hidden-layer) networks. For example, in the task of memorizing a random sample of size $m$ and dimension $d$, the best theoretical result requires t… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  18. arXiv:1903.03488  [pdf, other

    cs.LG stat.ML

    Is Deeper Better only when Shallow is Good?

    Authors: Eran Malach, Shai Shalev-Shwartz

    Abstract: Understanding the power of depth in feed-forward neural networks is an ongoing challenge in the field of deep learning theory. While current works account for the importance of depth for the expressive power of neural-networks, it remains an open question whether these benefits are exploited during a gradient-based optimization process. In this work we explore the relation between expressivity pro… ▽ More

    Submitted 8 March, 2019; originally announced March 2019.

  19. arXiv:1901.05022  [pdf, ps, other

    cs.RO

    Vision Zero: on a Provable Method for Eliminating Roadway Accidents without Compromising Traffic Throughput

    Authors: Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

    Abstract: We propose an economical, viable, approach to eliminate almost all car accidents. Our method relies on a mathematical model of safety and can be applied to all modern cars at a mild cost.

    Submitted 17 January, 2019; v1 submitted 9 December, 2018; originally announced January 2019.

  20. arXiv:1803.09522  [pdf, other

    cs.LG stat.ML

    A Provably Correct Algorithm for Deep Learning that Actually Works

    Authors: Eran Malach, Shai Shalev-Shwartz

    Abstract: We describe a layer-by-layer algorithm for training deep convolutional networks, where each step involves gradient updates for a two layer network followed by a simple clustering algorithm. Our algorithm stems from a deep generative model that generates mages level by level, where lower resolution images correspond to latent semantic classes. We analyze the convergence rate of our algorithm assumi… ▽ More

    Submitted 24 June, 2018; v1 submitted 26 March, 2018; originally announced March 2018.

  21. arXiv:1710.10174  [pdf, other

    cs.LG

    SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data

    Authors: Alon Brutzkus, Amir Globerson, Eran Malach, Shai Shalev-Shwartz

    Abstract: Neural networks exhibit good generalization behavior in the over-parameterized regime, where the number of network parameters exceeds the number of observations. Nonetheless, current generalization bounds for neural networks fail to explain this phenomenon. In an attempt to bridge this gap, we study the problem of learning a two-layer over-parameterized neural network, when the data is generated b… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.

  22. arXiv:1708.06374  [pdf, other

    cs.RO cs.AI stat.ML

    On a Formal Model of Safe and Scalable Self-driving Cars

    Authors: Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

    Abstract: In recent years, car makers and tech companies have been racing towards self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of this paper is to add to the equation two additional crucial parameters. The first is standardization of safety assurance --- what are the minimal requirements that every self-driving car must satisfy, and ho… ▽ More

    Submitted 27 October, 2018; v1 submitted 21 August, 2017; originally announced August 2017.

  23. arXiv:1706.02613  [pdf, other

    cs.LG

    Decoupling "when to update" from "how to update"

    Authors: Eran Malach, Shai Shalev-Shwartz

    Abstract: Deep learning requires data. A useful approach to obtain data is to be creative and mine data from various sources, that were created for different purposes. Unfortunately, this approach often leads to noisy labels. In this paper, we propose a meta algorithm for tackling the noisy labels problem. The key idea is to decouple "when to update" from "how to update". We demonstrate the effectiveness of… ▽ More

    Submitted 26 March, 2018; v1 submitted 8 June, 2017; originally announced June 2017.

  24. arXiv:1706.00687  [pdf, ps, other

    cs.LG

    Weight Sharing is Crucial to Succesful Optimization

    Authors: Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

    Abstract: Exploiting the great expressive power of Deep Neural Network architectures, relies on the ability to train them. While current theoretical work provides, mostly, results showing the hardness of this task, empirical evidence usually differs from this line, with success stories in abundance. A strong position among empirically successful architectures is captured by networks where extensive weight s… ▽ More

    Submitted 2 June, 2017; originally announced June 2017.

  25. arXiv:1703.07950  [pdf, other

    cs.LG cs.NE stat.ML

    Failures of Gradient-Based Deep Learning

    Authors: Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

    Abstract: In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art. However, it is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms. We describe four types of simple problems, for which the gradient-based algorithm… ▽ More

    Submitted 26 April, 2017; v1 submitted 23 March, 2017; originally announced March 2017.

  26. arXiv:1701.04271  [pdf, ps, other

    cs.LG

    Fast Rates for Empirical Risk Minimization of Strict Saddle Problems

    Authors: Alon Gonen, Shai Shalev-Shwartz

    Abstract: We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property. Recent progress in non-convex optimization has yielded efficient algorithms for minimizing such functions. Our results imply that these efficient algorithms are statistically stable and also generalize well. In particular, we derive fas… ▽ More

    Submitted 4 June, 2017; v1 submitted 16 January, 2017; originally announced January 2017.

  27. arXiv:1610.03295  [pdf, other

    cs.AI cs.LG stat.ML

    Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

    Authors: Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua

    Abstract: Autonomous driving is a multi-agent setting where the host vehicle must apply sophisticated negotiation skills with other road users when overtaking, giving way, merging, taking left and right turns and while pushing ahead in unstructured urban roadways. Since there are many possible scenarios, manually tackling all possible cases will likely yield a too simplistic policy. Moreover, one must balan… ▽ More

    Submitted 11 October, 2016; originally announced October 2016.

  28. arXiv:1607.02925  [pdf, ps, other

    cs.IT

    Faster Low-rank Approximation using Adaptive Gap-based Preconditioning

    Authors: Alon Gonen, Shai Shalev-Shwartz

    Abstract: We propose a method for rank $k$ approximation to a given input matrix $X \in \mathbb{R}^{d \times n}$ which runs in time \[ \tilde{O} \left(d ~\cdot~ \min\left\{n + \tilde{sr}(X) \,G^{-2}_{k,p+1}\ ,\ n^{3/4}\, \tilde{sr}(X)^{1/4} \,G^{-1/2}_{k,p+1} \right\} ~\cdot~ \text{poly}(p)\right) ~, \] where $p>k$, $\tilde{sr}(X)$ is related to stable rank of $X$, and $G_{k,p+1} = \frac{σ_k-σ_p}{σ_k}$ is t… ▽ More

    Submitted 11 July, 2016; originally announced July 2016.

  29. arXiv:1605.07270  [pdf, other

    cs.CV

    Learning a Metric Embedding for Face Recognition using the Multibatch Method

    Authors: Oren Tadmor, Yonatan Wexler, Tal Rosenwein, Shai Shalev-Shwartz, Amnon Shashua

    Abstract: This work is motivated by the engineering task of achieving a near state-of-the-art face recognition on a minimal computing budget running on an embedded system. Our main technical contribution centers around a novel training method, called Multibatch, for similarity learning, i.e., for the task of generating an invariant "face signature" through training pairs of "same" and "not-same" face images… ▽ More

    Submitted 23 May, 2016; originally announced May 2016.

  30. arXiv:1604.06915  [pdf, other

    cs.LG

    On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training

    Authors: Shai Shalev-Shwartz, Amnon Shashua

    Abstract: We compare the end-to-end training approach to a modular approach in which a system is decomposed into semantically meaningful components. We focus on the sample complexity aspect, in the regime where an extremely high accuracy is necessary, as is the case in autonomous driving applications. We demonstrate cases in which the number of training examples required by the end-to-end approach is expone… ▽ More

    Submitted 23 April, 2016; originally announced April 2016.

  31. arXiv:1603.03714  [pdf, other

    cs.LG

    Distribution Free Learning with Local Queries

    Authors: Galit Bary-Weisberg, Amit Daniely, Shai Shalev-Shwartz

    Abstract: The model of learning with \emph{local membership queries} interpolates between the PAC model and the membership queries model by allowing the learner to query the label of any example that is similar to an example in the training set. This model, recently proposed and studied by Awasthi, Feldman and Kanade, aims to facilitate practical use of membership queries. We continue this line of work, p… ▽ More

    Submitted 11 March, 2016; originally announced March 2016.

  32. arXiv:1602.02350  [pdf, ps, other

    cs.LG

    Solving Ridge Regression using Sketched Preconditioned SVRG

    Authors: Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz

    Abstract: We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods. By equip** Stochastic Variance Reduced Gradient (SVRG) with this preconditioning process, we obtain a significant speed-up relative to fast stochastic methods such as SVRG, SDCA and SAG.

    Submitted 26 May, 2016; v1 submitted 7 February, 2016; originally announced February 2016.

  33. arXiv:1602.01690  [pdf, other

    cs.LG

    Minimizing the Maximal Loss: How and Why?

    Authors: Shai Shalev-Shwartz, Yonatan Wexler

    Abstract: A commonly used learning rule is to approximately minimize the \emph{average} loss over the training set. Other learning algorithms, such as AdaBoost and hard-SVM, aim at minimizing the \emph{maximal} loss over the training set. The average loss is more popular, particularly in deep learning, due to three main reasons. First, it can be conveniently minimized using online algorithms, that process f… ▽ More

    Submitted 22 May, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

    Comments: ICML 2016

  34. arXiv:1602.01582  [pdf, ps, other

    cs.LG

    SDCA without Duality, Regularization, and Individual Convexity

    Authors: Shai Shalev-Shwartz

    Abstract: Stochastic Dual Coordinate Ascent is a popular method for solving regularized loss minimization for the case of convex losses. We describe variants of SDCA that do not require explicit regularization and do not rely on duality. We prove linear convergence rates even if individual loss functions are non-convex, as long as the expected loss is strongly convex.

    Submitted 21 May, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

    Comments: ICML 2016

  35. arXiv:1602.01580  [pdf, other

    cs.LG

    Long-term Planning by Short-term Prediction

    Authors: Shai Shalev-Shwartz, Nir Ben-Zrihem, Aviad Cohen, Amnon Shashua

    Abstract: We consider planning problems, that often arise in autonomous driving applications, in which an agent should decide on immediate actions so as to optimize a long term objective. For example, when a car tries to merge in a roundabout it should decide on an immediate acceleration/braking command, while the long term effect of the command is the success/failure of the merge. Such problems are charact… ▽ More

    Submitted 4 February, 2016; originally announced February 2016.

  36. arXiv:1601.04011  [pdf, ps, other

    cs.LG

    Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization

    Authors: Alon Gonen, Shai Shalev-Shwartz

    Abstract: We show that the average stability notion introduced by \cite{kearns1999algorithmic, bousquet2002stability} is invariant to data preconditioning, for a wide class of generalized linear models that includes most of the known exp-concave losses. In other words, when analyzing the stability rate of a given algorithm, we may assume the optimal preconditioning of the data. This implies that, at least f… ▽ More

    Submitted 16 April, 2017; v1 submitted 15 January, 2016; originally announced January 2016.

  37. arXiv:1507.02030  [pdf, other

    cs.LG math.OC

    Beyond Convexity: Stochastic Quasi-Convex Optimization

    Authors: Elad Hazan, Kfir Y. Levy, Shai Shalev-Shwartz

    Abstract: Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves. In… ▽ More

    Submitted 28 October, 2015; v1 submitted 8 July, 2015; originally announced July 2015.

  38. arXiv:1506.02649  [pdf, ps, other

    math.NA cs.LG

    Faster SGD Using Sketched Conditioning

    Authors: Alon Gonen, Shai Shalev-Shwartz

    Abstract: We propose a novel method for speeding up stochastic optimization algorithms via sketching methods, which recently became a powerful tool for accelerating algorithms for numerical linear algebra. We revisit the method of conditioning for accelerating first-order methods and suggest the use of sketching methods for constructing a cheap conditioner that attains a significant speedup with respect to… ▽ More

    Submitted 8 June, 2015; originally announced June 2015.

  39. arXiv:1503.06833  [pdf, other

    math.OC cs.LG

    On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems

    Authors: Yossi Arjevani, Shai Shalev-Shwartz, Ohad Shamir

    Abstract: We develop a novel framework to study smooth and strongly convex optimization algorithms, both deterministic and stochastic. Focusing on quadratic functions we are able to examine optimization algorithms as a recursive application of linear operators. This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereby new lower and… ▽ More

    Submitted 23 March, 2015; originally announced March 2015.

  40. arXiv:1503.03712  [pdf, other

    cs.LG math.OC

    On Graduated Optimization for Stochastic Non-Convex Problems

    Authors: Elad Hazan, Kfir Y. Levy, Shai Shalev-Shwartz

    Abstract: The graduated optimization approach, also known as the continuation method, is a popular heuristic to solving non-convex problems that has received renewed interest over the last decade. Despite its popularity, very little is known in terms of theoretical convergence analysis. In this paper we describe a new first-order algorithm based on graduated optimiza- tion and analyze its performance. We ch… ▽ More

    Submitted 8 July, 2015; v1 submitted 12 March, 2015; originally announced March 2015.

    Comments: 17 pages

    MSC Class: 68

  41. arXiv:1502.07073  [pdf, ps, other

    cs.LG

    Strongly Adaptive Online Learning

    Authors: Amit Daniely, Alon Gonen, Shai Shalev-Shwartz

    Abstract: Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal. We present a reduction that can transform standard low-regret algorithms to strongly adaptive. As a consequence, we derive simple, yet efficient, strongly adaptive algorithms for a handful of problems.

    Submitted 19 June, 2015; v1 submitted 25 February, 2015; originally announced February 2015.

  42. arXiv:1502.06177  [pdf, other

    cs.LG

    SDCA without Duality

    Authors: Shai Shalev-Shwartz

    Abstract: Stochastic Dual Coordinate Ascent is a popular method for solving regularized loss minimization for the case of convex losses. In this paper we show how a variant of SDCA can be applied for non-convex losses. We prove linear convergence rate even if individual loss functions are non-convex as long as the expected loss is convex.

    Submitted 21 February, 2015; originally announced February 2015.

  43. arXiv:1411.3436  [pdf, other

    stat.ML cs.LG

    SelfieBoost: A Boosting Algorithm for Deep Learning

    Authors: Shai Shalev-Shwartz

    Abstract: We describe and analyze a new boosting algorithm for deep learning called SelfieBoost. Unlike other boosting algorithms, like AdaBoost, which construct ensembles of classifiers, SelfieBoost boosts the accuracy of a single network. We prove a $\log(1/ε)$ convergence rate for SelfieBoost under some "SGD success" assumption which seems to hold in practice.

    Submitted 8 April, 2017; v1 submitted 12 November, 2014; originally announced November 2014.

  44. arXiv:1410.1141  [pdf, other

    cs.LG cs.AI stat.ML

    On the Computational Efficiency of Training Neural Networks

    Authors: Roi Livni, Shai Shalev-Shwartz, Ohad Shamir

    Abstract: It is well-known that neural networks are computationally hard to train. On the other hand, in practice, modern day neural networks are trained efficiently using SGD and a variety of tricks that include different activation functions (e.g. ReLU), over-specification (i.e., train networks which are larger than needed), and regularization. In this paper we revisit the computational complexity of trai… ▽ More

    Submitted 28 October, 2014; v1 submitted 5 October, 2014; originally announced October 2014.

    Comments: Section 2 is revised due to a mistake

  45. arXiv:1405.2420  [pdf, ps, other

    cs.LG

    Optimal Learners for Multiclass Problems

    Authors: Amit Daniely, Shai Shalev-Shwartz

    Abstract: The fundamental theorem of statistical learning states that for binary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for multiclass prediction. We start by proving a surprising result: a generic optimal multiclass learner must be improper, namely, it must have the ability to o… ▽ More

    Submitted 10 May, 2014; originally announced May 2014.

  46. arXiv:1402.4844  [pdf, ps, other

    cs.LG stat.ML

    Subspace Learning with Partial Information

    Authors: Alon Gonen, Dan Rosenbaum, Yonina Eldar, Shai Shalev-Shwartz

    Abstract: The goal of subspace learning is to find a $k$-dimensional subspace of $\mathbb{R}^d$, such that the expected squared distance between instance vectors and the subspace is as small as possible. In this paper we study subspace learning in a partial information setting, in which the learner can only observe $r \le d$ attributes from each instance vector. We propose several efficient algorithms for t… ▽ More

    Submitted 26 May, 2016; v1 submitted 19 February, 2014; originally announced February 2014.

  47. arXiv:1311.2272  [pdf, other

    cs.LG cs.CC

    From average case complexity to improper learning complexity

    Authors: Amit Daniely, Nati Linial, Shai Shalev-Shwartz

    Abstract: The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are efficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a.… ▽ More

    Submitted 9 March, 2014; v1 submitted 10 November, 2013; originally announced November 2013.

    Comments: 34 pages

  48. arXiv:1309.2375  [pdf, other

    stat.ML cs.LG math.NA stat.CO

    Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization

    Authors: Shai Shalev-Shwartz, Tong Zhang

    Abstract: We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. We analyze the runtime of the framework and obtain rates that improve state-of-the-art results for various key machine learning optimization problems including SVM, logistic regression, ridge regression, Lasso, and multiclass SVM. Experimen… ▽ More

    Submitted 8 October, 2013; v1 submitted 10 September, 2013; originally announced September 2013.

  49. arXiv:1308.2893  [pdf, other

    cs.LG

    Multiclass learnability and the ERM principle

    Authors: Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz

    Abstract: We study the sample complexity of multiclass prediction in several learning settings. For the PAC setting our analysis reveals a surprising phenomenon: In sharp contrast to binary classification, we show that there exist multiclass hypothesis classes for which some Empirical Risk Minimizers (ERM learners) have lower sample complexity than others. Furthermore, there are classes that are learnable b… ▽ More

    Submitted 24 November, 2014; v1 submitted 13 August, 2013; originally announced August 2013.

    Journal ref: Journal of Machine Learning Research, 16(Jul):1275-1304, 2015

  50. arXiv:1305.2581  [pdf, other

    stat.ML cs.LG

    Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

    Authors: Shai Shalev-Shwartz, Tong Zhang

    Abstract: Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the mini-batch setting that is often used in practice. Our main contribution is to introduce an accelerated mini-batch version of SDCA and prove a fast convergence rate for this method. We discuss an implementatio… ▽ More

    Submitted 12 May, 2013; originally announced May 2013.