Search | arXiv e-print repository

LayerNAS: Neural Architecture Search in Polynomial Complexity

Authors: Yicheng Fan, Dana Alon, **gyue Shen, Daiyi Peng, Keshav Kumar, Yun Long, Xin Wang, Fotis Iliopoulos, Da-Cheng Juan, Erik Vee

Abstract: Neural Architecture Search (NAS) has become a popular method for discovering effective model architectures, especially for target hardware. As such, NAS methods that find optimal architectures under constraints are essential. In our paper, we propose LayerNAS to address the challenge of multi-objective NAS by transforming it into a combinatorial optimization problem, which effectively constrains t… ▽ More Neural Architecture Search (NAS) has become a popular method for discovering effective model architectures, especially for target hardware. As such, NAS methods that find optimal architectures under constraints are essential. In our paper, we propose LayerNAS to address the challenge of multi-objective NAS by transforming it into a combinatorial optimization problem, which effectively constrains the search complexity to be polynomial. For a model architecture with $L$ layers, we perform layerwise-search for each layer, selecting from a set of search options $\mathbb{S}$. LayerNAS groups model candidates based on one objective, such as model size or latency, and searches for the optimal model based on another objective, thereby splitting the cost and reward elements of the search. This approach limits the search complexity to $ O(H \cdot |\mathbb{S}| \cdot L) $, where $H$ is a constant set in LayerNAS. Our experiments show that LayerNAS is able to consistently discover superior models across a variety of search spaces in comparison to strong baselines, including search spaces derived from NATS-Bench, MobileNetV2 and MobileNetV3. △ Less

Submitted 22 April, 2023; originally announced April 2023.

arXiv:2302.03806 [pdf, other]

SLaM: Student-Label Mixing for Distillation with Unlabeled Examples

Authors: Vasilis Kontonis, Fotis Iliopoulos, Khoa Trinh, Cenk Baykal, Gaurav Menghani, Erik Vee

Abstract: Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled data. In this setting, a large teacher model generates ``soft'' pseudo-labels for the unlabeled dataset which are then used for training the student model. Des… ▽ More Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled data. In this setting, a large teacher model generates ``soft'' pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called ``forward loss-adjustment" methods. △ Less

Submitted 8 June, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

arXiv:2210.06711 [pdf, other]

Weighted Distillation with Unlabeled Examples

Authors: Fotis Iliopoulos, Vasilis Kontonis, Cenk Baykal, Gaurav Menghani, Khoa Trinh, Erik Vee

Abstract: Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited: A large ''teacher'' neural network is trained on the labeled data available, and then it is used to generate labels on an unlabeled dataset (typically much larger in size). These labels are then utilized to train the smaller ''student'' mo… ▽ More Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited: A large ''teacher'' neural network is trained on the labeled data available, and then it is used to generate labels on an unlabeled dataset (typically much larger in size). These labels are then utilized to train the smaller ''student'' model which will actually be deployed. Naturally, the success of the approach depends on the quality of the teacher's labels, since the student could be confused if trained on inaccurate data. This paper proposes a principled approach for addressing this issue based on a ''debiasing'' reweighting of the student's loss function tailored to the distillation training paradigm. Our method is hyper-parameter free, data-agnostic, and simple to implement. We demonstrate significant improvements on popular academic datasets and we accompany our results with a theoretical analysis which rigorously justifies the performance of our method in certain settings. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: To appear in NeurIPS 2022

arXiv:2210.01213 [pdf, other]

Robust Active Distillation

Authors: Cenk Baykal, Khoa Trinh, Fotis Iliopoulos, Gaurav Menghani, Erik Vee

Abstract: Distilling knowledge from a large teacher model to a lightweight one is a widely successful approach for generating compact, powerful models in the semi-supervised learning setting where a limited amount of labeled data is available. In large-scale applications, however, the teacher tends to provide a large number of incorrect soft-labels that impairs student performance. The sheer size of the tea… ▽ More Distilling knowledge from a large teacher model to a lightweight one is a widely successful approach for generating compact, powerful models in the semi-supervised learning setting where a limited amount of labeled data is available. In large-scale applications, however, the teacher tends to provide a large number of incorrect soft-labels that impairs student performance. The sheer size of the teacher additionally constrains the number of soft-labels that can be queried due to prohibitive computational and/or financial costs. The difficulty in achieving simultaneous \emph{efficiency} (i.e., minimizing soft-label queries) and \emph{robustness} (i.e., avoiding student inaccuracies due to incorrect labels) hurts the widespread application of knowledge distillation to many modern tasks. In this paper, we present a parameter-free approach with provable guarantees to query the soft-labels of points that are simultaneously informative and correctly labeled by the teacher. At the core of our work lies a game-theoretic formulation that explicitly considers the inherent trade-off between the informativeness and correctness of input instances. We establish bounds on the expected performance of our approach that hold even in worst-case distillation instances. We present empirical evaluations on popular benchmarks that demonstrate the improved distillation performance enabled by our work relative to that of state-of-the-art active learning and active distillation methods. △ Less

Submitted 4 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2011.05258 [pdf, other]

Group testing and local search: is there a computational-statistical gap?

Authors: Fotis Iliopoulos, Ilias Zadik

Abstract: In this work we study the fundamental limits of approximate recovery in the context of group testing. One of the most well-known, theoretically optimal, and easy to implement testing procedures is the non-adaptive Bernoulli group testing problem, where all tests are conducted in parallel, and each item is chosen to be part of any certain test independently with some fixed probability. In this sett… ▽ More In this work we study the fundamental limits of approximate recovery in the context of group testing. One of the most well-known, theoretically optimal, and easy to implement testing procedures is the non-adaptive Bernoulli group testing problem, where all tests are conducted in parallel, and each item is chosen to be part of any certain test independently with some fixed probability. In this setting, there is an observed gap between the number of tests above which recovery is information theoretically (IT) possible, and the number of tests required by the currently best known efficient algorithms to succeed. Often times such gaps are explained by a phase transition in the landscape of the solution space of the problem (an Overlap Gap Property phase transition). In this paper we seek to understand whether such a phenomenon takes place for Bernoulli group testing as well. Our main contributions are the following: (1) We provide first moment evidence that, perhaps surprisingly, such a phase transition does not take place throughout the regime for which recovery is IT possible. This fact suggests that the model is in fact amenable to local search algorithms ; (2) we prove the complete absence of "bad" local minima for a part of the "hard" regime, a fact which implies an improvement over known theoretical results on the performance of efficient algorithms for approximate recovery without false-negatives, and finally (3) we present extensive simulations that strongly suggest that a very simple local algorithm known as Glauber Dynamics does indeed succeed, and can be used to efficiently implement the well-known (theoretically optimal) Smallest Satisfying Set (SSS) estimator. △ Less

Submitted 28 July, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

Comments: Accepted for publication in COLT 2021. Various minor mistakes are corrected

arXiv:2008.05569 [pdf, ps, other]

A new notion of commutativity for the algorithmic Lovász Local Lemma

Authors: David G. Harris, Fotis Iliopoulos, Vladimir Kolmogorov

Abstract: The Lovász Local Lemma (LLL) is a powerful tool in probabilistic combinatorics which can be used to establish the existence of objects that satisfy certain properties. The breakthrough paper of Moser and Tardos and follow-up works revealed that the LLL has intimate connections with a class of stochastic local search algorithms for finding such desirable objects. In particular, it can be seen as a… ▽ More The Lovász Local Lemma (LLL) is a powerful tool in probabilistic combinatorics which can be used to establish the existence of objects that satisfy certain properties. The breakthrough paper of Moser and Tardos and follow-up works revealed that the LLL has intimate connections with a class of stochastic local search algorithms for finding such desirable objects. In particular, it can be seen as a sufficient condition for this type of algorithms to converge fast. Besides conditions for existence of and fast convergence to desirable objects, one may naturally ask further questions regarding properties of these algorithms. For instance, "are they parallelizable?", "how many solutions can they output?", "what is the expected "weight" of a solution?", etc. These questions and more have been answered for a class of LLL-inspired algorithms called commutative. In this paper we introduce a new, very natural and more general notion of commutativity (essentially matrix commutativity) which allows us to show a number of new refined properties of LLL-inspired local search algorithms with significantly simpler proofs. △ Less

Submitted 30 March, 2022; v1 submitted 12 August, 2020; originally announced August 2020.

Journal ref: RANDOM 2021

arXiv:2004.02066 [pdf, ps, other]

Improved bounds for coloring locally sparse hypergraphs

Authors: Fotis Iliopoulos

Abstract: We show that, for every $k \ge 2$, every $k$-uniform hypergaph of degree $Δ$ and girth at least $5$ is efficiently $(1+o(1) )(k-1) (Δ/ \ln Δ)^{ 1/(k-1) } $-list colorable. As an application (and to the best of our knowledge) we obtain the currently best algorithm for list-coloring random hypergraphs of bounded average degree. We show that, for every $k \ge 2$, every $k$-uniform hypergaph of degree $Δ$ and girth at least $5$ is efficiently $(1+o(1) )(k-1) (Δ/ \ln Δ)^{ 1/(k-1) } $-list colorable. As an application (and to the best of our knowledge) we obtain the currently best algorithm for list-coloring random hypergraphs of bounded average degree. △ Less

Submitted 7 August, 2021; v1 submitted 4 April, 2020; originally announced April 2020.

arXiv:1812.10309 [pdf, ps, other]

Efficiently list-edge coloring multigraphs asymptotically optimally

Authors: Fotis Iliopoulos, Alistair Sinclair

Abstract: We give polynomial time algorithms for the seminal results of Kahn, who showed that the Goldberg-Seymour and List-Coloring conjectures for (list-)edge coloring multigraphs hold asymptotically. Kahn's arguments are based on the probabilistic method and are non-constructive. Our key insight is to show that the main result of Achlioptas, Iliopoulos and Kolmogorov for analyzing local search algorithms… ▽ More We give polynomial time algorithms for the seminal results of Kahn, who showed that the Goldberg-Seymour and List-Coloring conjectures for (list-)edge coloring multigraphs hold asymptotically. Kahn's arguments are based on the probabilistic method and are non-constructive. Our key insight is to show that the main result of Achlioptas, Iliopoulos and Kolmogorov for analyzing local search algorithms can be used to make constructive applications of a powerful version of the so-called Lopsided Lovasz Local Lemma. In particular, we use it to design algorithms that exploit the fact that correlations in the probability spaces on matchings used by Kahn decay with distance. △ Less

Submitted 7 December, 2021; v1 submitted 26 December, 2018; originally announced December 2018.

arXiv:1809.07910 [pdf, ps, other]

Simple Local Computation Algorithms for the General Lovasz Local Lemma

Authors: Dimitris Achlioptas, Themis Gouleakis, Fotis Iliopoulos

Abstract: We consider the task of designing Local Computation Algorithms (LCA) for applications of the Lovász Local Lemma (LLL). LCA is a class of sublinear algorithms proposed by Rubinfeld et al.~\cite{Ronitt} that have received a lot of attention in recent years. The LLL is an existential, sufficient condition for a collection of sets to have non-empty intersection (in applications, often, each set compri… ▽ More We consider the task of designing Local Computation Algorithms (LCA) for applications of the Lovász Local Lemma (LLL). LCA is a class of sublinear algorithms proposed by Rubinfeld et al.~\cite{Ronitt} that have received a lot of attention in recent years. The LLL is an existential, sufficient condition for a collection of sets to have non-empty intersection (in applications, often, each set comprises all objects having a certain property). The ground-breaking algorithm of Moser and Tardos~\cite{MT} made the LLL fully constructive, following earlier results by Beck~\cite{beck_lll} and Alon~\cite{alon_lll} giving algorithms under significantly stronger LLL-like conditions. LCAs under those stronger conditions were given in~\cite{Ronitt}, where it was asked if the Moser-Tardos algorithm can be used to design LCAs under the standard LLL condition. The main contribution of this paper is to answer this question affirmatively. In fact, our techniques yield LCAs for settings beyond the standard LLL condition. △ Less

Submitted 6 July, 2020; v1 submitted 20 September, 2018; originally announced September 2018.

arXiv:1809.01537 [pdf, ps, other]

A Local Lemma for Focused Stochastic Algorithms

Authors: Dimitris Achlioptas, Fotis Iliopoulos, Vladimir Kolmogorov

Abstract: We develop a framework for the rigorous analysis of focused stochastic local search algorithms. These are algorithms that search a state space by repeatedly selecting some constraint that is violated in the current state and moving to a random nearby state that addresses the violation, while hopefully not introducing many new ones. An important class of focused local search algorithms with provabl… ▽ More We develop a framework for the rigorous analysis of focused stochastic local search algorithms. These are algorithms that search a state space by repeatedly selecting some constraint that is violated in the current state and moving to a random nearby state that addresses the violation, while hopefully not introducing many new ones. An important class of focused local search algorithms with provable performance guarantees has recently arisen from algorithmizations of the Lovász Local Lemma (LLL), a non-constructive tool for proving the existence of satisfying states by introducing a background measure on the state space. While powerful, the state transitions of algorithms in this class must be, in a precise sense, perfectly compatible with the background measure. In many applications this is a very restrictive requirement and one needs to step outside the class. Here we introduce the notion of \emph{measure distortion} and develop a framework for analyzing arbitrary focused stochastic local search algorithms, recovering LLL algorithmizations as the special case of no distortion. Our framework takes as input an arbitrary such algorithm and an arbitrary probability measure and shows how to use the measure as a yardstick of algorithmic progress, even for algorithms designed independently of the measure. △ Less

Submitted 3 September, 2018; originally announced September 2018.

Comments: This paper is based on results that appeared in preliminary form in SODA 2016 and FOCS 2016. arXiv admin note: text overlap with arXiv:1507.07633

arXiv:1805.02026 [pdf, ps, other]

Beyond the Lovasz Local Lemma: Point to Set Correlations and Their Algorithmic Applications

Authors: Dimitris Achlioptas, Fotis Iliopoulos, Alistair Sinclair

Abstract: Following the groundbreaking algorithm of Moser and Tardos for the Lovasz Local Lemma (LLL), there has been a plethora of results analyzing local search algorithms for various constraint satisfaction problems. The algorithms considered fall into two broad categories: resampling algorithms, analyzed via different algorithmic LLL conditions; and backtracking algorithms, analyzed via entropy compress… ▽ More Following the groundbreaking algorithm of Moser and Tardos for the Lovasz Local Lemma (LLL), there has been a plethora of results analyzing local search algorithms for various constraint satisfaction problems. The algorithms considered fall into two broad categories: resampling algorithms, analyzed via different algorithmic LLL conditions; and backtracking algorithms, analyzed via entropy compression arguments. This paper introduces a new convergence condition that seamlessly handles resampling, backtracking, and hybrid algorithms, i.e., algorithms that perform both resampling and backtracking steps. Unlike all past LLL work, our condition replaces the notion of a dependency or causality graph by quantifying point-to-set correlations between bad events. As a result, our condition simultaneously: (i)~captures the most general algorithmic LLL condition known as a special case; (ii)~significantly simplifies the analysis of entropy compression applications; (iii)~relates backtracking algorithms, which are conceptually very different from resampling algorithms, to the LLL; and most importantly (iv)~allows for the analysis of hybrid algorithms, which were outside the scope of previous techniques. We give several applications of our condition, including a new hybrid vertex coloring algorithm that extends the recent breakthrough result of Molloy for coloring triangle-free graphs to arbitrary graphs. △ Less

Submitted 18 August, 2020; v1 submitted 5 May, 2018; originally announced May 2018.

arXiv:1704.02796 [pdf, ps, other]

Commutative Algorithms Approximate the LLL-distribution

Authors: Fotis Iliopoulos

Abstract: Following the groundbreaking Moser-Tardos algorithm for the Lovasz Local Lemma (LLL), a series of works have exploited a key ingredient of the original analysis, the witness tree lemma, in order to: derive deterministic, parallel and distributed algorithms for the LLL, to estimate the entropy of the output distribution, to partially avoid bad events, to deal with super-polynomially many bad events… ▽ More Following the groundbreaking Moser-Tardos algorithm for the Lovasz Local Lemma (LLL), a series of works have exploited a key ingredient of the original analysis, the witness tree lemma, in order to: derive deterministic, parallel and distributed algorithms for the LLL, to estimate the entropy of the output distribution, to partially avoid bad events, to deal with super-polynomially many bad events, and even to devise new algorithmic frameworks. Meanwhile, a parallel line of work, has established tools for analyzing stochastic local search algorithms motivated by the LLL that do not fall within the Moser-Tardos framework. Unfortunately, the aforementioned results do not transfer to these more general settings. Mainly, this is because the witness tree lemma, provably, no longer holds. Here we prove that for commutative algorithms, a class recently introduced by Kolmogorov and which captures the vast majority of LLL applications, the witness tree lemma does hold. Armed with this fact, we extend the main result of Haeupler, Saha, and Srinivasan to commutative algorithms, establishing that the output of such algorithms well-approximates the LLL-distribution, i.e., the distribution obtained by conditioning on all bad events being avoided, and give several new applications. For example, we show that the recent algorithm of Molloy for list coloring number of sparse, triangle-free graphs can output exponential many list colorings of the input graph. △ Less

Submitted 8 June, 2019; v1 submitted 10 April, 2017; originally announced April 2017.

arXiv:1607.06494 [pdf, ps, other]

Stochastic Control via Entropy Compression

Authors: Dimitris Achlioptas, Fotis Iliopoulos, Nikos Vlassis

Abstract: We consider an agent trying to bring a system to an acceptable state by repeated probabilistic action. Several recent works on algorithmizations of the Lovasz Local Lemma (LLL) can be seen as establishing sufficient conditions for the agent to succeed. Here we study whether such stochastic control is also possible in a noisy environment, where both the process of state-observation and the process… ▽ More We consider an agent trying to bring a system to an acceptable state by repeated probabilistic action. Several recent works on algorithmizations of the Lovasz Local Lemma (LLL) can be seen as establishing sufficient conditions for the agent to succeed. Here we study whether such stochastic control is also possible in a noisy environment, where both the process of state-observation and the process of state-evolution are subject to adversarial perturbation (noise). The introduction of noise causes the tools developed for LLL algorithmization to break down since the key LLL ingredient, the sparsity of the causality (dependence) relationship, no longer holds. To overcome this challenge we develop a new analysis where entropy plays a central role, both to measure the rate at which progress towards an acceptable state is made and the rate at which noise undoes this progress. The end result is a sufficient condition that allows a smooth tradeoff between the intensity of the noise and the amenability of the system, recovering an asymmetric LLL condition in the noiseless case. △ Less

Submitted 26 November, 2016; v1 submitted 21 July, 2016; originally announced July 2016.

Comments: 18 pages

arXiv:1507.07633 [pdf, ps, other]

Focused Stochastic Local Search and the Lovász Local Lemma

Authors: Dimitris Achlioptas, Fotis Iliopoulos

Abstract: We develop tools for analyzing focused stochastic local search algorithms. These are algorithms which search a state space probabilistically by repeatedly selecting a constraint that is violated in the current state and moving to a random nearby state which, hopefully, addresses the violation without introducing many new ones. A large class of such algorithms arise from the algorithmization of the… ▽ More We develop tools for analyzing focused stochastic local search algorithms. These are algorithms which search a state space probabilistically by repeatedly selecting a constraint that is violated in the current state and moving to a random nearby state which, hopefully, addresses the violation without introducing many new ones. A large class of such algorithms arise from the algorithmization of the Lovász Local Lemma, a non-constructive tool for proving the existence of satisfying states. Here we give tools that provide a unified analysis of such algorithms and of many more, expressing them as instances of a general framework. △ Less

Submitted 15 August, 2015; v1 submitted 27 July, 2015; originally announced July 2015.

Comments: Generalized the analysis of the Recursive Walk algorithm; corrected the proof of Acyclic Edge Coloring result

MSC Class: 68W20 ACM Class: F.1.2; G.3

arXiv:1406.0242 [pdf, ps, other]

Random Walks that Find Perfect Objects and the Lovász Local Lemma

Authors: Dimitris Achlioptas, Fotis Iliopoulos

Abstract: We give an algorithmic local lemma by establishing a sufficient condition for the uniform random walk on a directed graph to reach a sink quickly. Our work is inspired by Moser's entropic method proof of the Lovász Local Lemma (LLL) for satisfiability and completely bypasses the Probabilistic Method formulation of the LLL. In particular, our method works when the underlying state space is entirely… ▽ More We give an algorithmic local lemma by establishing a sufficient condition for the uniform random walk on a directed graph to reach a sink quickly. Our work is inspired by Moser's entropic method proof of the Lovász Local Lemma (LLL) for satisfiability and completely bypasses the Probabilistic Method formulation of the LLL. In particular, our method works when the underlying state space is entirely unstructured. Similarly to Moser's argument, the key point is that the inevitability of reaching a sink is established by bounding the entropy of the walk as a function of time. △ Less

Submitted 8 April, 2015; v1 submitted 2 June, 2014; originally announced June 2014.

Comments: 28 pages, added weighted version, added Independent Sets version, added Latin Squares Application

MSC Class: 68W20 ACM Class: F.1.2; G.3

Showing 1–15 of 15 results for author: Iliopoulos, F