-
Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces
Authors:
Bert Moons,
Parham Noorzad,
Andrii Skliar,
Giovanni Mariani,
Dushyant Mehta,
Chris Lott,
Tijmen Blankevoort
Abstract:
Current state-of-the-art Neural Architecture Search (NAS) methods neither efficiently scale to multiple hardware platforms, nor handle diverse architectural search-spaces. To remedy this, we present DONNA (Distilling Optimal Neural Network Architectures), a novel pipeline for rapid, scalable and diverse NAS, that scales to many user scenarios. DONNA consists of three phases. First, an accuracy pre…
▽ More
Current state-of-the-art Neural Architecture Search (NAS) methods neither efficiently scale to multiple hardware platforms, nor handle diverse architectural search-spaces. To remedy this, we present DONNA (Distilling Optimal Neural Network Architectures), a novel pipeline for rapid, scalable and diverse NAS, that scales to many user scenarios. DONNA consists of three phases. First, an accuracy predictor is built using blockwise knowledge distillation from a reference model. This predictor enables searching across diverse networks with varying macro-architectural parameters such as layer types and attention mechanisms, as well as across micro-architectural parameters such as block repeats and expansion rates. Second, a rapid evolutionary search finds a set of pareto-optimal architectures for any scenario using the accuracy predictor and on-device measurements. Third, optimal models are quickly finetuned to training-from-scratch accuracy. DONNA is up to 100x faster than MNasNet in finding state-of-the-art architectures on-device. Classifying ImageNet, DONNA architectures are 20% faster than EfficientNet-B0 and MobileNetV2 on a Nvidia V100 GPU and 10% faster with 0.5% higher accuracy than MobileNetV2-1.4x on a Samsung S20 smartphone. In addition to NAS, DONNA is used for search-space extension and exploration, as well as hardware-aware model compression.
△ Less
Submitted 27 August, 2021; v1 submitted 16 December, 2020;
originally announced December 2020.
-
Negligible Cooperation: Contrasting the Maximal- and Average-Error Cases
Authors:
Parham Noorzad,
Michael Langberg,
Michelle Effros
Abstract:
In communication networks, cooperative strategies are coding schemes where network nodes work together to improve network performance metrics such as the total rate delivered across the network. This work studies encoder cooperation in the setting of a discrete multiple access channel (MAC) with two encoders and a single decoder. A network node, here called the cooperation facilitator (CF), that i…
▽ More
In communication networks, cooperative strategies are coding schemes where network nodes work together to improve network performance metrics such as the total rate delivered across the network. This work studies encoder cooperation in the setting of a discrete multiple access channel (MAC) with two encoders and a single decoder. A network node, here called the cooperation facilitator (CF), that is connected to both encoders via rate-limited links, enables the cooperation strategy. Previous work by the authors presents two classes of MACs: (i) one class where the average-error sum-capacity has an infinite derivative in the limit where CF output link capacities approach zero, and (ii) a second class of MACs where the maximal-error sum-capacity is not continuous at the point where the output link capacities of the CF equal zero. This work contrasts the power of the CF in the maximal- and average-error cases, showing that a constant number of bits communicated over the CF output link can yield a positive gain in the maximal-error sum-capacity, while a far greater number of bits, even numbers that grow sublinearly in the blocklength, can never yield a non-negligible gain in the average-error sum-capacity.
△ Less
Submitted 23 November, 2019;
originally announced November 2019.
-
The Birthday Problem and Zero-Error List Codes
Authors:
Parham Noorzad,
Michelle Effros,
Michael Langberg,
Victoria Kostina
Abstract:
As an attempt to bridge the gap between the probabilistic world of classical information theory and the combinatorial world of zero-error information theory, this paper studies the performance of randomly generated codebooks over discrete memoryless channels under a zero-error list-decoding constraint. This study allows the application of tools from one area to the other. Furthermore, it leads to…
▽ More
As an attempt to bridge the gap between the probabilistic world of classical information theory and the combinatorial world of zero-error information theory, this paper studies the performance of randomly generated codebooks over discrete memoryless channels under a zero-error list-decoding constraint. This study allows the application of tools from one area to the other. Furthermore, it leads to an information-theoretic formulation of the birthday problem, which is concerned with the probability that in a given population, a fixed number of people have the same birthday. Due to the lack of a closed-form expression for this probability when the distribution of birthdays is not uniform, the resulting expression is not simple to analyze; in the information-theoretic formulation, however, the asymptotic behavior of this probability can be characterized exactly for all distributions.
△ Less
Submitted 8 December, 2018; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Can Negligible Cooperation Increase Network Capacity? The Average-Error Case
Authors:
Parham Noorzad,
Michelle Effros,
Michael Langberg
Abstract:
In communication networks, cooperative strategies are coding schemes where network nodes work together to improve network performance metrics such as sum-rate. This work studies encoder cooperation in the setting of a discrete multiple access channel with two encoders and a single decoder. A node in the network that is connected to both encoders via rate-limited links, referred to as the cooperati…
▽ More
In communication networks, cooperative strategies are coding schemes where network nodes work together to improve network performance metrics such as sum-rate. This work studies encoder cooperation in the setting of a discrete multiple access channel with two encoders and a single decoder. A node in the network that is connected to both encoders via rate-limited links, referred to as the cooperation facilitator (CF), enables the cooperation strategy. Previously, the authors presented a class of multiple access channels where the average-error sum-capacity has an infinite derivative in the limit where CF output link capacities approach zero. The authors also demonstrated that for some channels, the maximal-error sum-capacity is not continuous at the point where the output link capacities of the CF equal zero. This work shows that the the average-error sum-capacity is continuous when CF output link capacities converge to zero; that is, the infinite derivative of the average-error sum-capacity is not a result of its discontinuity as in the maximal-error case.
△ Less
Submitted 11 January, 2018;
originally announced January 2018.
-
The Benefit of Encoder Cooperation in the Presence of State Information
Authors:
Parham Noorzad,
Michelle Effros,
Michael Langberg
Abstract:
In many communication networks, the availability of channel state information at various nodes provides an opportunity for network nodes to work together, or "cooperate." This work studies the benefit of cooperation in the multiple access channel with a cooperation facilitator, distributed state information at the encoders, and full state information available at the decoder. Under various causali…
▽ More
In many communication networks, the availability of channel state information at various nodes provides an opportunity for network nodes to work together, or "cooperate." This work studies the benefit of cooperation in the multiple access channel with a cooperation facilitator, distributed state information at the encoders, and full state information available at the decoder. Under various causality constraints, sufficient conditions are obtained such that encoder cooperation through the facilitator results in a gain in sum-capacity that has infinite slope in the information rate shared with the encoders. This result extends the prior work of the authors on cooperation in networks where none of the nodes have access to state information.
△ Less
Submitted 18 July, 2017;
originally announced July 2017.
-
The Unbounded Benefit of Encoder Cooperation for the $k$-user MAC
Authors:
Parham Noorzad,
Michelle Effros,
Michael Langberg
Abstract:
Cooperation strategies allow communication devices to work together to improve network capacity. Consider a network consisting of a $k$-user multiple access channel (MAC) and a node that is connected to all $k$ encoders via rate-limited bidirectional links, referred to as the "cooperation facilitator" (CF). Define the cooperation benefit as the sum-capacity gain resulting from the communication be…
▽ More
Cooperation strategies allow communication devices to work together to improve network capacity. Consider a network consisting of a $k$-user multiple access channel (MAC) and a node that is connected to all $k$ encoders via rate-limited bidirectional links, referred to as the "cooperation facilitator" (CF). Define the cooperation benefit as the sum-capacity gain resulting from the communication between the encoders and the CF and the cooperation rate as the total rate the CF shares with the encoders. This work demonstrates the existence of a class of $k$-user MACs where the ratio of the cooperation benefit to cooperation rate tends to infinity as the cooperation rate tends to zero. Examples of channels in this class include the binary erasure MAC for $k=2$ and the $k$-user Gaussian MAC for any $k\geq 2$.
△ Less
Submitted 30 September, 2016; v1 submitted 22 January, 2016;
originally announced January 2016.
-
Can Negligible Cooperation Increase Network Reliability?
Authors:
Parham Noorzad,
Michelle Effros,
Michael Langberg
Abstract:
In network cooperation strategies, nodes work together with the aim of increasing transmission rates or reliability. This paper demonstrates that enabling cooperation between the transmitters of a two-user multiple access channel, via a cooperation facilitator that has access to both messages, always results in a network whose maximal- and average-error sum-capacities are the same---even when thos…
▽ More
In network cooperation strategies, nodes work together with the aim of increasing transmission rates or reliability. This paper demonstrates that enabling cooperation between the transmitters of a two-user multiple access channel, via a cooperation facilitator that has access to both messages, always results in a network whose maximal- and average-error sum-capacities are the same---even when those capacities differ in the absence of cooperation and the information shared with the encoders is negligible. From this result, it follows that if a multiple access channel with no transmitter cooperation has different maximal- and average-error sum-capacities, then the maximal-error sum-capacity of the network consisting of this channel and a cooperation facilitator is not continuous with respect to the output edge capacities of the facilitator. This shows that there exist networks where sharing even a negligible number of bits per channel use with the encoders yields a non-negligible benefit.
△ Less
Submitted 30 September, 2016; v1 submitted 21 January, 2016;
originally announced January 2016.
-
The Multivariate Covering Lemma and its Converse
Authors:
Parham Noorzad,
Michelle Effros,
Michael Langberg
Abstract:
The multivariate covering lemma states that given a collection of $k$ codebooks, each of sufficiently large cardinality and independently generated according to one of the marginals of a joint distribution, one can always choose one codeword from each codebook such that the resulting $k$-tuple of codewords is jointly typical with respect to the joint distribution. We give a proof of this lemma for…
▽ More
The multivariate covering lemma states that given a collection of $k$ codebooks, each of sufficiently large cardinality and independently generated according to one of the marginals of a joint distribution, one can always choose one codeword from each codebook such that the resulting $k$-tuple of codewords is jointly typical with respect to the joint distribution. We give a proof of this lemma for weakly typical sets. This allows achievability proofs that rely on the covering lemma to go through for continuous channels (e.g., Gaussian) without the need for quantization. The covering lemma and its converse are widely used in information theory, including in rate-distortion theory and in achievability results for multi-user channels.
△ Less
Submitted 21 January, 2016; v1 submitted 13 August, 2015;
originally announced August 2015.
-
On the Cost and Benefit of Cooperation (Extended Version)
Authors:
Parham Noorzad,
Michelle Effros,
Michael Langberg
Abstract:
In a cooperative coding scheme, network nodes work together to achieve higher transmission rates. To obtain a better understanding of cooperation, we consider a model in which two transmitters send rate-limited descriptions of their messages to a "cooperation facilitator", a node that sends back rate-limited descriptions of the pair to each transmitter. This model includes the conferencing encoder…
▽ More
In a cooperative coding scheme, network nodes work together to achieve higher transmission rates. To obtain a better understanding of cooperation, we consider a model in which two transmitters send rate-limited descriptions of their messages to a "cooperation facilitator", a node that sends back rate-limited descriptions of the pair to each transmitter. This model includes the conferencing encoders model and a prior model from the current authors as special cases. We show that except for a special class of multiple access channels, the gain in sum-capacity resulting from cooperation under this model is quite large. Adding a cooperation facilitator to any such channel results in a network that does not satisfy the edge removal property. An important special case is the Gaussian multiple access channel, for which we explicitly characterize the sum-rate cooperation gain.
△ Less
Submitted 16 April, 2015;
originally announced April 2015.
-
On the Power of Cooperation: Can a Little Help a Lot? (Extended Version)
Authors:
Parham Noorzad,
Michelle Effros,
Michael Langberg,
Tracey Ho
Abstract:
In this paper, we propose a new cooperation model for discrete memoryless multiple access channels. Unlike in prior cooperation models (e.g., conferencing encoders), where the transmitters cooperate directly, in this model the transmitters cooperate through a larger network. We show that under this indirect cooperation model, there exist channels for which the increase in sum-capacity resulting fr…
▽ More
In this paper, we propose a new cooperation model for discrete memoryless multiple access channels. Unlike in prior cooperation models (e.g., conferencing encoders), where the transmitters cooperate directly, in this model the transmitters cooperate through a larger network. We show that under this indirect cooperation model, there exist channels for which the increase in sum-capacity resulting from cooperation is significantly larger than the rate shared by the transmitters to establish the cooperation. This result contrasts both with results on the benefit of cooperation under prior models and results in the network coding literature, where attempts to find examples in which similar small network modifications yield large capacity benefits have to date been unsuccessful.
△ Less
Submitted 27 April, 2014; v1 submitted 25 January, 2014;
originally announced January 2014.