-
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Authors:
Frederik Kunstner,
Robin Yadav,
Alan Milligan,
Mark Schmidt,
Alberto Bietti
Abstract:
Adam has been shown to outperform gradient descent in optimizing large language transformers empirically, and by a larger margin than on other tasks, but it is unclear why this happens. We show that the heavy-tailed class imbalance found in language modeling tasks leads to difficulties in the optimization dynamics. When training with gradient descent, the loss associated with infrequent words decr…
▽ More
Adam has been shown to outperform gradient descent in optimizing large language transformers empirically, and by a larger margin than on other tasks, but it is unclear why this happens. We show that the heavy-tailed class imbalance found in language modeling tasks leads to difficulties in the optimization dynamics. When training with gradient descent, the loss associated with infrequent words decreases slower than the loss associated with frequent ones. As most samples come from relatively infrequent words, the average loss decreases slowly with gradient descent. On the other hand, Adam and sign-based methods do not suffer from this problem and improve predictions on all classes. To establish that this behavior is indeed caused by class imbalance, we show empirically that it persist through different architectures and data types, on language transformers, vision CNNs, and linear models. We further study this phenomenon on a linear classification with cross-entropy loss, showing that heavy-tailed class imbalance leads to ill-conditioning, and that the normalization used by Adam can counteract it.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Using 4MOST to refine the measurement of galaxy properties: A case study of Supernova hosts
Authors:
J. Dumayne,
I. M. Hook,
S. C. Williams,
G. A. Lowes,
D. Head,
A. Fritz,
O. Graur,
B. Holwerda,
A. Humphrey,
A. Milligan,
M. Nicholl,
B. F. Roukema,
P. Wiseman
Abstract:
The Rubin Observatory's 10-year Legacy Survey of Space and Time will observe near to 20 billion galaxies. For each galaxy the properties can be inferred. Approximately $10^5$ galaxies observed per year will contain Type Ia supernovae (SNe), allowing SN host-galaxy properties to be calculated on a large scale. Measuring the properties of SN host-galaxies serves two main purposes. The first is that…
▽ More
The Rubin Observatory's 10-year Legacy Survey of Space and Time will observe near to 20 billion galaxies. For each galaxy the properties can be inferred. Approximately $10^5$ galaxies observed per year will contain Type Ia supernovae (SNe), allowing SN host-galaxy properties to be calculated on a large scale. Measuring the properties of SN host-galaxies serves two main purposes. The first is that there are known correlations between host-galaxy type and supernova type, which can be used to aid in the classification of SNe. Secondly, Type Ia SNe exhibit correlations between host-galaxy properties and the peak luminosities of the SNe, which has implications for their use as standardisable candles in cosmology. We have used simulations to quantify the improvement in host-galaxy stellar mass ($M_\ast$) measurements when supplementing photometry from Rubin with spectroscopy from the 4-metre Multi-Object Spectroscopic Telescope (4MOST) instrument. We provide results in the form of expected uncertainties in $M_\ast$ for galaxies with 0.1 < $z$ < 0.9 and 18 < $r_{AB}$ < 25. We show that for galaxies mag 22 and brighter, combining Rubin and 4MOST data reduces the uncertainty measurements of galaxy $M_\ast$ by more than a factor of 2 compared with Rubin data alone. This applies for elliptical and Sc type hosts. We demonstrate that the reduced uncertainties in $M_\ast$ lead to an improvement of 7\% in the precision of the "mass step" correction. We expect our improved measurements of host-galaxy properties to aid in the photometric classification of SNe observed by Rubin.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
UNSAT Solver Synthesis via Monte Carlo Forest Search
Authors:
Chris Cameron,
Jason Hartford,
Taylor Lundy,
Tuan Truong,
Alan Milligan,
Rex Chen,
Kevin Leyton-Brown
Abstract:
We introduce Monte Carlo Forest Search (MCFS), a class of reinforcement learning (RL) algorithms for learning policies in {tree MDPs}, for which policy execution involves traversing an exponential-sized tree. Examples of such problems include proving unsatisfiability of a SAT formula; counting the number of solutions of a satisfiable SAT formula; and finding the optimal solution to a mixed-integer…
▽ More
We introduce Monte Carlo Forest Search (MCFS), a class of reinforcement learning (RL) algorithms for learning policies in {tree MDPs}, for which policy execution involves traversing an exponential-sized tree. Examples of such problems include proving unsatisfiability of a SAT formula; counting the number of solutions of a satisfiable SAT formula; and finding the optimal solution to a mixed-integer program. MCFS algorithms can be seen as extensions of Monte Carlo Tree Search (MCTS) to cases where, rather than finding a good path (solution) within a tree, the problem is to find a small tree within a forest of candidate trees. We instantiate and evaluate our ideas in an algorithm that we dub Knuth Synthesis, an MCFS algorithm that learns DPLL branching policies for solving the Boolean satisfiability (SAT) problem, with the objective of achieving good average-case performance on a given distribution of unsatisfiable problem instances. Knuth Synthesis leverages two key ideas to avoid the prohibitive costs of policy evaluations in an exponentially-sized tree. First, we estimate tree size by randomly sampling paths and measuring their lengths, drawing on an unbiased approximation due to Knuth (1975). Second, we query a strong solver at a user-defined depth rather than learning a policy across the whole tree, to focus our policy search on early decisions that offer the greatest potential for reducing tree size. We matched or improved performance over a strong baseline on three well-known SAT distributions (R3SAT, sgen, satfc).
△ Less
Submitted 25 May, 2023; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Air-Releasable Soft Robots for Explosive Ordnance Disposal
Authors:
Tyler C. Looney,
Nathan M. Savard,
Gus T. Teran,
Archie G. Milligan,
Ryley I. Wheelock,
Michael Scalise,
Daniel P. Perno,
Gregory C. Lewin,
Carlo Pinciroli,
Cagdas D. Onal,
Markus P. Nemitz
Abstract:
The demining of landmines using drones is challenging; air-releasable payloads are typically non-intelligent (e.g., water balloons or explosives) and deploying them at even low altitudes (~6 meter) is inherently inaccurate due to complex deployment trajectories and constrained visual awareness by the drone pilot. Soft robotics offers a unique approach for aerial demining, namely due to the robust,…
▽ More
The demining of landmines using drones is challenging; air-releasable payloads are typically non-intelligent (e.g., water balloons or explosives) and deploying them at even low altitudes (~6 meter) is inherently inaccurate due to complex deployment trajectories and constrained visual awareness by the drone pilot. Soft robotics offers a unique approach for aerial demining, namely due to the robust, low-cost, and lightweight designs of soft robots. Instead of non-intelligent payloads, here, we propose the use of air-releasable soft robots for demining. We developed a full system consisting of an unmanned aerial vehicle retrofitted to a soft robot carrier including a custom-made deployment mechanism, and an air-releasable, lightweight (296 g), untethered soft hybrid robot with integrated electronics that incorporates a new type of a vacuum-based flasher roller actuator system. We demonstrate a deployment cycle in which the drone drops the soft robotic hybrid from an altitude of 4.5 m meters and after which the robot approaches a dummy landmine. By deploying soft robots at points of interest, we can transition soft robotic technologies from the laboratory to real-world environments.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Electric field induced semiconductor-to-metal phase transition in vertical MoTe2 and Mo1-xWxTe2 devices
Authors:
Feng Zhang,
Sergiy Krylyuk,
Huairuo Zhang,
Cory A. Milligan,
Dmitry Y. Zemlyanov,
Leonid A. Bendersky,
Albert V. Davydov,
Joerg Appenzeller
Abstract:
Over the past years, transition metal dichalcogenides (TMDs) have attracted attention as potential building blocks for various electronic applications due to their atomically thin nature. An exciting development is the recent success in 'engineering' crystal phases of TMD compounds during the growth due to their polymorphic character. Here, we report an electric field induced reversible engineered…
▽ More
Over the past years, transition metal dichalcogenides (TMDs) have attracted attention as potential building blocks for various electronic applications due to their atomically thin nature. An exciting development is the recent success in 'engineering' crystal phases of TMD compounds during the growth due to their polymorphic character. Here, we report an electric field induced reversible engineered phase transition in vertical 2H-MoTe2 devices, a crucial experimental finding that enables electrical phase switching for these ultra-thin layered materials. Scanning tunneling microscopy (STM) was utilized to analyze the TMD crystalline structure after applying an electric field, and scanning tunneling spectroscopy (STS) was employed to map a semiconductor-to-metal phase transition on the nanoscale. In addition, direct confirmation of a phase transition from 2H semiconductor to a distorted 2H' metallic phase was obtained by scanning transmission electron microscopy (STEM). MoTe2 and Mo1-xWxTe2 alloy based vertical resistive random access memory (RRAM) cells were fabricated to demonstrate clear reproducible and controlled switching with programming voltages that are tunable by the layer thickness and that show a distinctly different trend for the binary compound if compared to the ternary materials.
△ Less
Submitted 12 September, 2017;
originally announced September 2017.