-
Should I try multiple optimizers when fine-tuning pre-trained Transformers for NLP tasks? Should I tune their hyperparameters?
Authors:
Nefeli Gkouti,
Prodromos Malakasiotis,
Stavros Toumpis,
Ion Androutsopoulos
Abstract:
NLP research has explored different neural model architectures and sizes, datasets, training objectives, and transfer learning techniques. However, the choice of optimizer during training has not been explored as extensively. Typically, some variant of Stochastic Gradient Descent (SGD) is employed, selected among numerous variants, using unclear criteria, often with minimal or no tuning of the opt…
▽ More
NLP research has explored different neural model architectures and sizes, datasets, training objectives, and transfer learning techniques. However, the choice of optimizer during training has not been explored as extensively. Typically, some variant of Stochastic Gradient Descent (SGD) is employed, selected among numerous variants, using unclear criteria, often with minimal or no tuning of the optimizer's hyperparameters. Experimenting with five GLUE datasets, two models (DistilBERT and DistilRoBERTa), and seven popular optimizers (SGD, SGD with Momentum, Adam, AdaMax, Nadam, AdamW, and AdaBound), we find that when the hyperparameters of the optimizers are tuned, there is no substantial difference in test performance across the five more elaborate (adaptive) optimizers, despite differences in training loss. Furthermore, tuning just the learning rate is in most cases as good as tuning all the hyperparameters. Hence, we recommend picking any of the best-behaved adaptive optimizers (e.g., Adam) and tuning only its learning rate. When no hyperparameter can be tuned, SGD with Momentum is the best choice.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
A Simple Network of Nodes Moving on the Circle
Authors:
Dimitris Cheliotis,
Ioannis Kontoyiannis,
Michail Loulakis,
Stavros Toumpis
Abstract:
Two simple Markov processes are examined, one in discrete and one in continuous time, arising from idealized versions of a transmission protocol for mobile, delay-tolerant networks. We consider two independent walkers moving with constant speed on either the discrete or continuous circle, and changing directions at independent geometric (respectively, exponential) times. One of the walkers carries…
▽ More
Two simple Markov processes are examined, one in discrete and one in continuous time, arising from idealized versions of a transmission protocol for mobile, delay-tolerant networks. We consider two independent walkers moving with constant speed on either the discrete or continuous circle, and changing directions at independent geometric (respectively, exponential) times. One of the walkers carries a message that wishes to travel as far and as fast as possible in the clockwise direction. The message stays with its current carrier unless the two walkers meet, the carrier is moving counter-clockwise, and the other walker is moving clockwise. In that case, the message jumps to the other walker. The long-term average clockwise speed of the message is computed. An explicit expression is derived via the solution of an associated boundary value problem in terms of the generator of the underlying Markov process. The average transmission cost is also similarly computed, measured as the long-term number of jumps the message makes per unit time. The tradeoff between speed and cost is examined, as a function of the underlying problem parameters.
△ Less
Submitted 4 March, 2020; v1 submitted 11 August, 2018;
originally announced August 2018.
-
Packet Speed and Cost in Mobile Wireless Delay-Tolerant Networks
Authors:
Riccardo Cavallari,
Stavros Toumpis,
Roberto Verdone,
Ioannis Kontoyiannis
Abstract:
A mobile wireless delay-tolerant network (DTN) model is proposed and analyzed, in which infinitely many nodes are initially placed on R^2 according to a uniform Poisson point process (PPP) and subsequently travel, independently of each other, along trajectories comprised of line segments, changing travel direction at time instances that form a Poisson process, each time selecting a new travel dire…
▽ More
A mobile wireless delay-tolerant network (DTN) model is proposed and analyzed, in which infinitely many nodes are initially placed on R^2 according to a uniform Poisson point process (PPP) and subsequently travel, independently of each other, along trajectories comprised of line segments, changing travel direction at time instances that form a Poisson process, each time selecting a new travel direction from an arbitrary distribution; all nodes maintain constant speed. A single information packet is traveling towards a given direction using both wireless transmissions and sojourns on node buffers, according to a member of a broad class of possible routing rules. For this model, we compute the long-term averages of the speed with which the packet travels towards its destination and the rate with which the wireless transmission cost accumulates. Because of the complexity of the problem, we employ two intuitive, simplifying approximations; simulations verify that the approximation error is typically small. Our results quantify the fundamental trade-off that exists in mobile wireless DTNs between the packet speed and the packet delivery cost. The framework developed here is both general and versatile, and can be used as a starting point for further investigation.
△ Less
Submitted 28 February, 2018; v1 submitted 7 January, 2018;
originally announced January 2018.
-
Interference Functionals in Poisson Networks
Authors:
Udo Schilcher,
Stavros Toumpis,
Martin Haenggi,
Alessandro Crismani,
Günther Brandner,
Christian Bettstetter
Abstract:
We propose and prove a theorem that allows the calculation of a class of functionals on Poisson point processes that have the form of expected values of sum-products of functions. In proving the theorem, we present a variant of the Campbell-Mecke theorem from stochastic geometry. We proceed to apply our result in the calculation of expected values involving interference in wireless Poisson network…
▽ More
We propose and prove a theorem that allows the calculation of a class of functionals on Poisson point processes that have the form of expected values of sum-products of functions. In proving the theorem, we present a variant of the Campbell-Mecke theorem from stochastic geometry. We proceed to apply our result in the calculation of expected values involving interference in wireless Poisson networks. Based on this, we derive outage probabilities for transmissions in a Poisson network with Nakagami fading. Our results extend the stochastic geometry toolbox used for the mathematical analysis of interference-limited wireless networks.
△ Less
Submitted 8 June, 2015; v1 submitted 30 September, 2014;
originally announced September 2014.
-
Packet Travel Times in Wireless Relay Chains under Spatially and Temporally Dependent Interference
Authors:
Alessandro Crismani,
Udo Schilcher,
Stavros Toumpis,
Günther Brandner,
Christian Bettstetter
Abstract:
We investigate the statistics of the number of time slots $T$ that it takes a packet to travel through a chain of wireless relays. Derivations are performed assuming an interference model for which interference possesses spatiotemporal dependency properties. When using this model, results are harder to arrive at analytically, but they are more realistic than the ones obtained in many related works…
▽ More
We investigate the statistics of the number of time slots $T$ that it takes a packet to travel through a chain of wireless relays. Derivations are performed assuming an interference model for which interference possesses spatiotemporal dependency properties. When using this model, results are harder to arrive at analytically, but they are more realistic than the ones obtained in many related works that are based on independent interference models.
First, we present a method for calculating the distribution of $T$. As the required computations are extensive, we also obtain simple expressions for the expected value $\mathrm{E} [T]$ and variance $\mathrm{var} [T]$. Finally, we calculate the asymptotic limit of the average speed of the packet. Our numerical results show that spatiotemporal dependence has a significant impact on the statistics of the travel time $T$. In particular, we show that, with respect to the independent interference case, $\mathrm{E} [T]$ and $\mathrm{var} [T]$ increase, whereas the packet speed decreases.
△ Less
Submitted 28 April, 2014; v1 submitted 12 November, 2013;
originally announced November 2013.
-
Cooperative Relaying in Wireless Networks under Spatially and Temporally Correlated Interference
Authors:
Alessandro Crismani,
Udo Schilcher,
Günther Brandner,
Stavros Toumpis,
Christian Bettstetter
Abstract:
We analyze the performance of an interference-limited, decode-and-forward, cooperative relaying system that comprises a source, a destination, and $N$ relays, placed arbitrarily on the plane and suffering from interference by a set of interferers placed according to a spatial Poisson process. In each transmission attempt, first the transmitter sends a packet; subsequently, a single one of the rela…
▽ More
We analyze the performance of an interference-limited, decode-and-forward, cooperative relaying system that comprises a source, a destination, and $N$ relays, placed arbitrarily on the plane and suffering from interference by a set of interferers placed according to a spatial Poisson process. In each transmission attempt, first the transmitter sends a packet; subsequently, a single one of the relays that received the packet correctly, if such a relay exists, retransmits it. We consider both selection combining and maximal ratio combining at the destination, Rayleigh fading, and interferer mobility.
We derive expressions for the probability that a single transmission attempt is successful, as well as for the distribution of the transmission attempts until a packet is transmitted successfully. Results provide design guidelines applicable to a wide range of systems. Overall, the temporal and spatial characteristics of the interference play a significant role in sha** the system performance. Maximal ratio combining is only helpful when relays are close to the destination; in harsh environments, having many relays is especially helpful, and relay placement is critical; the performance improves when interferer mobility increases; and a tradeoff exists between energy efficiency and throughput.
△ Less
Submitted 24 March, 2016; v1 submitted 2 August, 2013;
originally announced August 2013.
-
Asymptotic Capacity Bounds for Wireless Networks with Non-Uniform Traffic
Authors:
Stavros Toumpis
Abstract:
We develop bounds on the capacity of wireless networks when the traffic is non-uniform, i.e., not all nodes are required to receive and send similar volumes of traffic. Our results are asymptotic, i.e., they hold with probability going to unity as the number of nodes goes to infinity. We study \emph{(i)} asymmetric networks, where the numbers of sources and destinations of traffic are unequal, \em…
▽ More
We develop bounds on the capacity of wireless networks when the traffic is non-uniform, i.e., not all nodes are required to receive and send similar volumes of traffic. Our results are asymptotic, i.e., they hold with probability going to unity as the number of nodes goes to infinity. We study \emph{(i)} asymmetric networks, where the numbers of sources and destinations of traffic are unequal, \emph{(ii)} multicast networks, in which each created packet has multiple destinations, \emph{(iii)} cluster networks, that consist of clients and a limited number of cluster heads, and each client wants to communicate with any of the cluster heads, and \emph{(iv)} hybrid networks, in which the nodes are supported by a limited infrastructure. Our findings quantify the fundamental capabilities of these wireless networks to handle traffic bottlenecks, and point to correct design principles that achieve the capacity without resorting to overly complicated protocols.
△ Less
Submitted 23 October, 2013; v1 submitted 3 March, 2007;
originally announced March 2007.
-
Opti{c,m}al: Optical/Optimal Routing in Massively Dense Wireless Networks
Authors:
R. Catanuto,
S. Toumpis,
G. Morabito
Abstract:
We study routing for massively dense wireless networks, i.e., wireless networks that contain so many nodes that, in addition to their usual microscopic description, a novel macroscopic description becomes possible. The macroscopic description is not detailed, but nevertheless contains enough information to permit a meaningful study and performance optimization of the network. Within this context,…
▽ More
We study routing for massively dense wireless networks, i.e., wireless networks that contain so many nodes that, in addition to their usual microscopic description, a novel macroscopic description becomes possible. The macroscopic description is not detailed, but nevertheless contains enough information to permit a meaningful study and performance optimization of the network. Within this context, we continue and significantly expand previous work on the analogy between optimal routing and the propagation of light according to the laws of Geometrical Optics. Firstly, we pose the analogy in a more general framework than previously, notably showing how the eikonal equation, which is the central equation of Geometrical Optics, also appears in the networking context. Secondly, we develop a methodology for calculating the cost function, which is the function describing the network at the macroscopic level. We apply this methodology for two important types of networks: bandwidth limited and energy limited.
△ Less
Submitted 23 October, 2013; v1 submitted 2 August, 2006;
originally announced August 2006.