Search | arXiv e-print repository

Diverse Imagenet Models Transfer Better

Authors: Niv Nayman, Avram Golbert, Asaf Noy, Tan **, Lihi Zelnik-Manor

Abstract: A commonly accepted hypothesis is that models with higher accuracy on Imagenet perform better on other downstream tasks, leading to much research dedicated to optimizing Imagenet accuracy. Recently this hypothesis has been challenged by evidence showing that self-supervised models transfer better than their supervised counterparts, despite their inferior Imagenet accuracy. This calls for identifyi… ▽ More A commonly accepted hypothesis is that models with higher accuracy on Imagenet perform better on other downstream tasks, leading to much research dedicated to optimizing Imagenet accuracy. Recently this hypothesis has been challenged by evidence showing that self-supervised models transfer better than their supervised counterparts, despite their inferior Imagenet accuracy. This calls for identifying the additional factors, on top of Imagenet accuracy, that make models transferable. In this work we show that high diversity of the features learnt by the model promotes transferability jointly with Imagenet accuracy. Encouraged by the recent transferability results of self-supervised models, we propose a method that combines self-supervised and supervised pretraining to generate models with both high diversity and high accuracy, and as a result high transferability. We demonstrate our results on several architectures and multiple downstream tasks, including both single-label and multi-label classification. △ Less

Submitted 19 April, 2022; originally announced April 2022.

MSC Class: 68T07; 68T10; 68T45 ACM Class: I.2.10; I.2.6; I.4.10

arXiv:2110.12399 [pdf, other]

BINAS: Bilinear Interpretable Neural Architecture Search

Authors: Niv Nayman, Yonathan Aflalo, Asaf Noy, Rong **, Lihi Zelnik-Manor

Abstract: Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, he… ▽ More Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, hence, the resulting accuracy of the generated models is often harmed. In this work we resolve this by introducing Bilinear Interpretable Neural Architecture Search (BINAS), that is based on an accurate and simple bilinear formulation of both an accuracy estimator and the expected resource requirement, together with a scalable search method with theoretical guarantees. The simplicity of our proposed estimator together with the intuitive way it is constructed bring interpretability through many insights about the contribution of different design choices. For example, we find that in the examined search space, adding depth and width is more effective at deeper stages of the network and at the beginning of each resolution stage. Our experiments show that BINAS generates comparable to or better architectures than other state-of-the-art NAS methods within a reduced marginal search cost, while strictly satisfying the resource constraints. △ Less

Submitted 27 April, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

Comments: The full code is released at https://github.com/Alibaba-MIIL/BINAS

MSC Class: 68T09; 68T45 ACM Class: G.1.6; G.3; I.2.8; I.2.10; I.5.1

arXiv:2102.11646 [pdf, other]

HardCoRe-NAS: Hard Constrained diffeRentiable Neural Architecture Search

Authors: Niv Nayman, Yonathan Aflalo, Asaf Noy, Lihi Zelnik-Manor

Abstract: Realistic use of neural networks often requires adhering to multiple constraints on latency, energy and memory among others. A popular approach to find fitting networks is through constrained Neural Architecture Search (NAS), however, previous methods enforce the constraint only softly. Therefore, the resulting networks do not exactly adhere to the resource constraint and their accuracy is harmed.… ▽ More Realistic use of neural networks often requires adhering to multiple constraints on latency, energy and memory among others. A popular approach to find fitting networks is through constrained Neural Architecture Search (NAS), however, previous methods enforce the constraint only softly. Therefore, the resulting networks do not exactly adhere to the resource constraint and their accuracy is harmed. In this work we resolve this by introducing Hard Constrained diffeRentiable NAS (HardCoRe-NAS), that is based on an accurate formulation of the expected resource requirement and a scalable search method that satisfies the hard constraint throughout the search. Our experiments show that HardCoRe-NAS generates state-of-the-art architectures, surpassing other NAS methods, while strictly satisfying the hard resource constraints without any tuning required. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: Niv Nayman and Yonathan Aflalo contributed equally. An implementation of HardCoRe-NAS is available at: https://github.com/Alibaba-MIIL/HardCoReNAS

MSC Class: 68T09; 68T45 ACM Class: G.1.6; G.3; I.2.8; I.2.10; I.5.1

arXiv:2101.05147 [pdf, other]

CobBO: Coordinate Backoff Bayesian Optimization with Two-Stage Kernels

Authors: Jian Tan, Niv Nayman, Mengchang Wang

Abstract: Bayesian optimization is a popular method for optimizing expensive black-box functions. Yet it oftentimes struggles in high dimensions where the computation could be prohibitively heavy. To alleviate this problem, we introduce Coordinate backoff Bayesian Optimization (CobBO) with two-stage kernels. During each round, the first stage uses a simple coarse kernel that sacrifices the approximation acc… ▽ More Bayesian optimization is a popular method for optimizing expensive black-box functions. Yet it oftentimes struggles in high dimensions where the computation could be prohibitively heavy. To alleviate this problem, we introduce Coordinate backoff Bayesian Optimization (CobBO) with two-stage kernels. During each round, the first stage uses a simple coarse kernel that sacrifices the approximation accuracy for computational efficiency. It captures the global landscape by purposely smoothing away local fluctuations. Then, in the second stage of the same round, past observed points in the full space are projected to the selected subspace to form virtual points. These virtual points, along with the means and variances of their unknown function values estimated using the simple kernel of the first stage, are fitted to a more sophisticated kernel model in the second stage. Within the selected low dimensional subspace, the computational cost of conducting Bayesian optimization therein becomes affordable. To further enhance the performance, a sequence of consecutive observations in the same subspace are collected, which can effectively refine the approximation of the function. This refinement lasts until a stop** rule is met determining when to back off from a certain subspace and switch to another. This decoupling significantly reduces the computational burden in high dimensions, which fully leverages the observations in the whole space rather than only relying on observations in each coordinate subspace. Extensive evaluations show that CobBO finds solutions comparable to or better than other state-of-the-art methods for dimensions ranging from tens to hundreds, while reducing both the trial complexity and computational costs. △ Less

Submitted 19 April, 2022; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: Jian Tan and Niv Nayman contributed equally. An implementation of CobBO is available at: https://github.com/Alibaba-MIIL/CobBO

MSC Class: 62C12; 68T20; 68W27 ACM Class: I.2.8; I.2.6; G.3

arXiv:1906.08031 [pdf, other]

XNAS: Neural Architecture Search with Expert Advice

Authors: Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong **, Lihi Zelnik-Manor

Abstract: This paper introduces a novel optimization method for differential neural architecture search, based on the theory of prediction with expert advice. Its optimization criterion is well fitted for an architecture-selection, i.e., it minimizes the regret incurred by a sub-optimal selection of operations. Unlike previous search relaxations, that require hard pruning of architectures, our method is des… ▽ More This paper introduces a novel optimization method for differential neural architecture search, based on the theory of prediction with expert advice. Its optimization criterion is well fitted for an architecture-selection, i.e., it minimizes the regret incurred by a sub-optimal selection of operations. Unlike previous search relaxations, that require hard pruning of architectures, our method is designed to dynamically wipe out inferior architectures and enhance superior ones. It achieves an optimal worst-case regret bound and suggests the use of multiple learning-rates, based on the amount of information carried by the backward gradients. Experiments show that our algorithm achieves a strong performance over several image classification datasets. Specifically, it obtains an error rate of 1.6% for CIFAR-10, 24% for ImageNet under mobile settings, and achieves state-of-the-art results on three additional datasets. △ Less

Submitted 19 June, 2019; originally announced June 2019.

arXiv:1904.04123 [pdf, other]

ASAP: Architecture Search, Anneal and Prune

Authors: Asaf Noy, Niv Nayman, Tal Ridnik, Nadav Zamir, Sivan Doveh, Itamar Friedman, Raja Giryes, Lihi Zelnik-Manor

Abstract: Automatic methods for Neural Architecture Search (NAS) have been shown to produce state-of-the-art network models. Yet, their main drawback is the computational complexity of the search process. As some primal methods optimized over a discrete search space, thousands of days of GPU were required for convergence. A recent approach is based on constructing a differentiable search space that enables… ▽ More Automatic methods for Neural Architecture Search (NAS) have been shown to produce state-of-the-art network models. Yet, their main drawback is the computational complexity of the search process. As some primal methods optimized over a discrete search space, thousands of days of GPU were required for convergence. A recent approach is based on constructing a differentiable search space that enables gradient-based optimization, which reduces the search time to a few days. While successful, it still includes some noncontinuous steps, e.g., the pruning of many weak connections at once. In this paper, we propose a differentiable search space that allows the annealing of architecture weights, while gradually pruning inferior operations. In this way, the search converges to a single output network in a continuous manner. Experiments on several vision datasets demonstrate the effectiveness of our method with respect to the search cost and accuracy of the achieved model. Specifically, with $0.2$ GPU search days we achieve an error rate of $1.68\%$ on CIFAR-10. △ Less

Submitted 10 October, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

Showing 1–6 of 6 results for author: Nayman, N