-
On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks
Authors:
Sunil Thulasidasan,
Gopinath Chennupati,
Jeff Bilmes,
Tanmoy Bhattacharya,
Sarah Michalak
Abstract:
Mixup~\cite{zhang2017mixup} is a recently proposed method for training deep neural networks where additional samples are generated during training by convexly combining random pairs of images and their associated labels. While simple to implement, it has been shown to be a surprisingly effective method of data augmentation for image classification: DNNs trained with mixup show noticeable gains in…
▽ More
Mixup~\cite{zhang2017mixup} is a recently proposed method for training deep neural networks where additional samples are generated during training by convexly combining random pairs of images and their associated labels. While simple to implement, it has been shown to be a surprisingly effective method of data augmentation for image classification: DNNs trained with mixup show noticeable gains in classification performance on a number of image classification benchmarks. In this work, we discuss a hitherto untouched aspect of mixup training -- the calibration and predictive uncertainty of models trained with mixup. We find that DNNs trained with mixup are significantly better calibrated -- i.e., the predicted softmax scores are much better indicators of the actual likelihood of a correct prediction -- than DNNs trained in the regular fashion. We conduct experiments on a number of image classification architectures and datasets -- including large-scale datasets like ImageNet -- and find this to be the case. Additionally, we find that merely mixing features does not result in the same calibration benefit and that the label smoothing in mixup training plays a significant role in improving calibration. Finally, we also observe that mixup-trained DNNs are less prone to over-confident predictions on out-of-distribution and random-noise data. We conclude that the typical overconfidence seen in neural networks, even on in-distribution data is likely a consequence of training with hard labels, suggesting that mixup be employed for classification tasks where predictive uncertainty is a significant concern.
△ Less
Submitted 6 January, 2020; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Spatiotemporal Modeling of Node Temperatures in Supercomputers
Authors:
Curtis B Storlie,
Brian J Reich,
William N Rust,
Lawrence O Ticknor,
Amanda M Bonnie,
Andrew J Montoya,
Sarah E Michalak
Abstract:
Los Alamos National Laboratory (LANL) is home to many large supercomputing clusters. These clusters require an enormous amount of power (~500-2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently a project was initiated to optimize the cooling system used to cool one of the rooms housing…
▽ More
Los Alamos National Laboratory (LANL) is home to many large supercomputing clusters. These clusters require an enormous amount of power (~500-2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently a project was initiated to optimize the cooling system used to cool one of the rooms housing three of these large clusters and develop a general good-practice procedure for reducing cooling costs and monitoring other machine rooms. This work focuses on the statistical approach used to quantify the effect that several cooling changes to the room had on the temperatures of the individual nodes of the computers. The largest cluster in the room has 1600 nodes that run a variety of jobs during general use. Since extremes temperatures are important, a Normal distribution plus generalized Pareto distribution for the upper tail is used to model the marginal distribution, along with a Gaussian process copula to account for spatio-temporal dependence. A Gaussian Markov random field (GMRF) model is used to model the spatial and/or temporal effects on the node temperatures as the cooling changes take place. This model is then used to assess the condition of the node temperatures after each change to the room. The analysis approach was used to uncover the cause of a problematic episode of overheating nodes on one of the supercomputing clusters. The next step is to also use the model to estimate the trend in node temperatures due to an increase in supply air temperature and ultimately decide when any further temperature increases would become unsafe. This same process can be applied to reduce the cooling expenses for other data centers as well.
△ Less
Submitted 25 May, 2016; v1 submitted 23 May, 2015;
originally announced May 2015.
-
Monitoring the Sky with the Prototype All-Sky Imager on the LWA1
Authors:
K. S. Obenberger,
G. B. Taylor,
J. M. Hartman,
T. E. Clarke,
J. Dowell,
A. Dubois,
D. Dubois,
P. A. Henning,
J. Lazio,
S. Michalak,
F. K. Schinzel
Abstract:
We present a description of the Prototype All-Sky Imager (PASI), a backend correlator and imager of the first station of the Long Wavelength Array (LWA1). PASI cross-correlates a live stream of 260 dual-polarization dipole antennas of the LWA1, creates all-sky images, and uploads them to the LWA-TV website in near real-time. PASI has recorded over 13,000 hours of all-sky images at frequencies betw…
▽ More
We present a description of the Prototype All-Sky Imager (PASI), a backend correlator and imager of the first station of the Long Wavelength Array (LWA1). PASI cross-correlates a live stream of 260 dual-polarization dipole antennas of the LWA1, creates all-sky images, and uploads them to the LWA-TV website in near real-time. PASI has recorded over 13,000 hours of all-sky images at frequencies between 10 and 88 MHz creating opportunities for new research and discoveries. We also report rate density and pulse energy density limits on transients at 38, 52, and 74 MHz, for pulse widths of 5 s. We limit transients at those frequencies with pulse energy densities of $>2.7\times 10^{-23}$, $>1.1\times 10^{-23}$, and $>2.8\times 10^{-23}$ J m$^{-2}$ Hz$^{-1}$ to have rate densities $<1.2\times10^{-4}$, $<5.6\times10^{-4}$, and $<7.2\times10^{-4}$ yr$^{-1}$ deg$^{-2}$
△ Less
Submitted 17 March, 2015;
originally announced March 2015.
-
Comparison of RFI Mitigation Strategies for Dispersed Pulse Detection
Authors:
John Hogden,
Scott Vander Wiel,
Geoffrey C. Bower,
Sarah Michalak,
Andrew Siemion,
Daniel Werthimer
Abstract:
Impulsive radio-frequency signals from astronomical sources are dispersed by the frequency dependent index of refraction of the interstellar media and so appear as chirped signals when they reach earth. Searches for dispersed impulses have been limited by false detections due to radio frequency interference (RFI) and, in some cases, artifacts of the instrumentation. Many authors have discussed tec…
▽ More
Impulsive radio-frequency signals from astronomical sources are dispersed by the frequency dependent index of refraction of the interstellar media and so appear as chirped signals when they reach earth. Searches for dispersed impulses have been limited by false detections due to radio frequency interference (RFI) and, in some cases, artifacts of the instrumentation. Many authors have discussed techniques to excise or mitigate RFI in searches for fast transients, but comparisons between different approaches are lacking. This work develops RFI mitigation techniques for use in searches for dispersed pulses, employing data recorded in a "Fly's Eye" mode of the Allen Telescope Array as a test case. We gauge the performance of several RFI mitigation techniques by adding dispersed signals to data containing RFI and comparing false alarm rates at the observed signal-to-noise ratios of the added signals. We find that Huber filtering is most effective at removing broadband interferers, while frequency centering is most effective at removing narrow frequency interferers. Neither of these methods is effective over a broad range of interferers. A method that combines Huber filtering and adaptive interference cancellation provides the lowest number of false positives over the interferers considered here. The methods developed here have application to other searches for dispersed pulses in incoherent spectra, especially those involving multiple beam systems.
△ Less
Submitted 6 January, 2012;
originally announced January 2012.