Skip to main content

Showing 1–15 of 15 results for author: Pezeshki, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.00158  [pdf, other

    cs.CV cs.AI cs.LG

    Feedback-guided Data Synthesis for Imbalanced Classification

    Authors: Reyhane Askari Hemmat, Mohammad Pezeshki, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: Current status quo in machine learning is to use static datasets of real images for training, which often come from long-tailed distributions. With the recent advances in generative models, researchers have started augmenting these static datasets with synthetic data, reporting moderate performance improvements on classification tasks. We hypothesize that these performance gains are limited by the… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  2. arXiv:2309.16748  [pdf, other

    cs.LG cs.AI stat.ML

    Discovering environments with XRM

    Authors: Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, David Lopez-Paz

    Abstract: Successful out-of-distribution generalization requires environment annotations. Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generaliz… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  3. arXiv:2306.13253  [pdf, other

    cs.LG

    Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

    Authors: Pascal Jr. Tikeng Notsawo, Hattie Zhou, Mohammad Pezeshki, Irina Rish, Guillaume Dumas

    Abstract: This paper focuses on predicting the occurrence of grokking in neural networks, a phenomenon in which perfect generalization emerges long after signs of overfitting or memorization are observed. It has been reported that grokking can only be observed with certain hyper-parameters. This makes it critical to identify the parameters that lead to grokking. However, since grokking occurs after a large… ▽ More

    Submitted 28 September, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: 26 pages, 30 figures

    ACM Class: I.2.6

  4. arXiv:2112.03215  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-scale Feature Learning Dynamics: Insights for Double Descent

    Authors: Mohammad Pezeshki, Amartya Mitra, Yoshua Bengio, Guillaume Lajoie

    Abstract: A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomen… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  5. arXiv:2110.14503  [pdf, other

    cs.LG cs.AI cs.CR

    Simple data balancing achieves competitive worst-group-accuracy

    Authors: Badr Youbi Idrissi, Martin Arjovsky, Mohammad Pezeshki, David Lopez-Paz

    Abstract: We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of… ▽ More

    Submitted 18 February, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted at CLeaR (Causal Learning and Reasoning) 2022

  6. arXiv:2011.09468  [pdf, other

    cs.LG math.DS stat.ML

    Gradient Starvation: A Learning Proclivity in Neural Networks

    Authors: Mohammad Pezeshki, Sékou-Oumar Kaba, Yoshua Bengio, Aaron Courville, Doina Precup, Guillaume Lajoie

    Abstract: We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the e… ▽ More

    Submitted 24 November, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: Proceeding of NeurIPS 2021

  7. arXiv:1809.06848  [pdf, other

    cs.LG cs.AI stat.ML

    On the Learning Dynamics of Deep Neural Networks

    Authors: Remi Tachet, Mohammad Pezeshki, Samira Shabanian, Aaron Courville, Yoshua Bengio

    Abstract: While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm emp… ▽ More

    Submitted 11 December, 2020; v1 submitted 18 September, 2018; originally announced September 2018.

    Comments: 19 pages, 7 figures

  8. arXiv:1807.04740  [pdf, other

    cs.LG stat.ML

    Negative Momentum for Improved Game Dynamics

    Authors: Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, Remi Lepriol, Gabriel Huang, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiable games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optim… ▽ More

    Submitted 28 August, 2020; v1 submitted 12 July, 2018; originally announced July 2018.

    Comments: Appears in: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019). Minor changes with respect to the AISTATS version: typo corrected in Thm. 6 (squared condition number instead of condition number; and small change in constant) and dependence in $β$ changed in Theorem 5 for the formal statement; not changing the conclusions. 28 pages

    ACM Class: I.2.6; G.1.6

  9. arXiv:1701.02720  [pdf, other

    cs.CL cs.LG stat.ML

    Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

    Authors: Ying Zhang, Mohammad Pezeshki, Philemon Brakel, Saizheng Zhang, Cesar Laurent Yoshua Bengio, Aaron Courville

    Abstract: Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classi… ▽ More

    Submitted 10 January, 2017; originally announced January 2017.

  10. arXiv:1606.01305  [pdf, other

    cs.NE cs.CL cs.LG

    Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

    Authors: David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal

    Abstract: We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of drop** hidden units, gradient information and state information are more readily propagated through time, as in feed… ▽ More

    Submitted 22 September, 2017; v1 submitted 3 June, 2016; originally announced June 2016.

    Comments: David Krueger and Tegan Maharaj contributed equally to this work

  11. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  12. arXiv:1511.06430  [pdf, other

    cs.LG

    Deconstructing the Ladder Network Architecture

    Authors: Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio

    Abstract: The Manual labeling of data is and will remain a costly endeavor. For this reason, semi-supervised learning remains a topic of practical importance. The recently proposed Ladder Network is one such approach that has proven to be very successful. In addition to the supervised objective, the Ladder Network also adds an unsupervised objective corresponding to the reconstruction costs of a stack of de… ▽ More

    Submitted 24 May, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016

  13. arXiv:1501.00299  [pdf, other

    cs.NE cs.LG

    Sequence Modeling using Gated Recurrent Neural Networks

    Authors: Mohammad Pezeshki

    Abstract: In this paper, we have used Recurrent Neural Networks to capture and model human motion data and generate motions by prediction of the next immediate data point at each time-step. Our RNN is armed with recently proposed Gated Recurrent Units which has shown promising results in some sequence modeling problems such as Machine Translation and Speech Synthesis. We demonstrate that this model is able… ▽ More

    Submitted 1 January, 2015; originally announced January 2015.

  14. arXiv:1312.6158  [pdf, other

    cs.LG cs.CV cs.NE

    Deep Belief Networks for Image Denoising

    Authors: Mohammad Ali Keyvanrad, Mohammad Pezeshki, Mohammad Ali Homayounpour

    Abstract: Deep Belief Networks which are hierarchical generative models are effective tools for feature representation and extraction. Furthermore, DBNs can be used in numerous aspects of Machine Learning such as image denoising. In this paper, we propose a novel method for image denoising which relies on the DBNs' ability in feature representation. This work is based upon learning of the noise behavior. Ge… ▽ More

    Submitted 2 January, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: ICLR 2014 Conference track

  15. arXiv:1312.6157  [pdf, other

    cs.LG cs.NE

    Distinction between features extracted using deep belief networks

    Authors: Mohammad Pezeshki, Sajjad Gholami, Ahmad Nickabadi

    Abstract: Data representation is an important pre-processing step in many machine learning algorithms. There are a number of methods used for this task such as Deep Belief Networks (DBNs) and Discrete Fourier Transforms (DFTs). Since some of the features extracted using automated feature extraction methods may not always be related to a specific machine learning task, in this paper we propose two methods in… ▽ More

    Submitted 2 January, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: 4 pages, 4 figures, ICLR 2014 workshop track